KT Learning Lab 2: A Conceptual Overview
A broad framework for knowledge tracing models based on logistic regression Pavlik, Eglington, and Harrell-Williams (2021)
First member of the LKT family that ran in real-time Pavlik Jr, Cen, and Koedinger (2009)
Measures how much latent skill a student has, while they are learning
But expresses it in terms of probability of correctness, the next time the skill is encountered
No direct expression of the amount of latent skill, except this probability of correctness
Assess a student’s knowledge of topic X
Based on a sequence of items that are dichotomously scored
Where the student can learn on each item, due to help, feedback, scaffolding, etc.
Each item may involve multiple latent skills or knowledge components
Each skill has success learning rate γ and failure learning rate ρ
There is also a difficulty parameter β, but its semantics can vary – more on this later
From these parameters, and the number of successes and failures the student has had on each relevant skill so far, we can compute the probability P(m) that the learner will get the item correct
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
-0.5 | 0.38 | |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
-0.5+(0.1)*1 |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
-0.4 | 0.40 |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
-0.5+(0.1*2) |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
-0.3 | 0.43 |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
1 | -0.3 | 0.43 |
-0.5+(0.1*2)+(0.2*1) |
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
1 | -0.3 | 0.43 |
-0.1 | 0.48 |
Represent when the student learns from an opportunity to practice?
As opposed to just better predicted performance because you’ve gotten it right
As opposed to just better predicted performance because you’ve gotten it right
Is it ρ ?
Is it average of ρ and γ?
Three degenerate cases
γ < 0
γ < ρ
γ = ρ = 0
When might you legitimately get them?
ρ < 0
γ < ρ
γ < 0
Three degenerate cases
γ < 0
γ < ρ
γ = ρ = 0
One seemingly degenerate (but not) case
“It is worth noting that a fourth case when ρ > 0 – is not degenerate, due to the multiple functions the parameters perform in PFA. In this case, the rate of learning the skill may outweigh the evidence of lack of student knowledge that an incorrect answer provides. So long as γ > ρ, a positive ρ is conceptually acceptable.”
γ = -0.1, ρ = -0.5, β = -0.5
Actual | m | P(m) |
---|---|---|
0 | -0.5 | 0.38 |
0 | -1 | 0.27 |
1 | -1.5 | 0.18 |
-1.6 | 0.17 |
γ = 0.1, ρ = 0.2, β = -0.5
Actual | m | P(m) |
0 | -0.5 | 0.38 |
0 | -0.3 | 0.43 |
1 | -0.1 | 0.48 |
0 | 0.5 |
Values of ρ below 0 don’t actually mean negative learning
They mean that failure provides more evidence on lack of knowledge
Than the learning opportunity causes improvement
Simply bound γ and ρ
Does not reduce model performance substantially (just like BKT)
What causes degeneracy? We’ll come back to this in a minute
Parameters in PFA combine information from correctness with improvement from practice improvement
Makes PFA models a little harder to interpret than BKT
γ = 0.2, ρ = 0.1, β = -0.5
Actual | m | P(m) |
0 | -0.5 | 0.38 |
0 | -0.4 | 0.40 |
1 | -0.3 | 0.43 |
-0.1 | 0.48 |
γ = 0.2, ρ = 0.1, β = -1.5
Actual | m | P(m) |
0 | -1.5 | 0.18 |
0 | -1.4 | 0.20 |
1 | -1.3 | 0.21 |
-1.1 | 0.25 |
γ = 0.2, ρ = 0.1, β = +3.0
Actual | m | P(m) |
0 | 3.0 | 0.953 |
0 | 3.1 | 0.957 |
1 | 3.2 | 0.961 |
3.4 | 0.968 |
Pavlik proposes three different β Parameters
Item
Item-Type
Skill
Result in different number of parameters
What are the circumstances where you might want item versus skill?
If β is used at the Skill or Item-Type level
And the learning system moves students from easier to harder items within a “skill”
Then γ < 0.
Also, if items are tagged with multiple skills, shared variance (collinearity) between skills could produce degenerate parameters.
Starts with initial values for each parameter
Estimates student correctness at each problem step
Estimates params using student correctness estimates
If goodness is substantially better than last time it was estimated, and max iterations has not been reached, go to step 2
EM is vulnerable to local minima
Randomized restart typically used
Yes, but by far fewer learning systems than BKT
Maier, Baker, and Stalzer (2021) discuss its use in Reveal Math 1
One issue in real-world use is handling rare skills, which can impact model inferences on common skills as well
Maier, Baker, and Stalzer (2021) handle this by creating a “catch all” skill for rare skills
Using average parameters from all common skills also works
PFA is a competitor for measuring student skill, which predicts the probability of correctness rather than latent knowledge
Can handle multiple KCs for the same item, a big virtue
Weights actions further back in order less strongly
Adds an evidence decay parameter δ
Substitutes
For the previous summation
Very slightly higher AUC (0.003)
Weights actions further back in order less strongly
Looks at proportion of success-failure, weighting by distance in order from current action Adds an evidence decay parameter b Adds “ghost practices” before current practice to make math work
Substitutes
For the previous summation
A little higher AUC (0.003-0.027) Pavlik Jr, Eglington, and Zhang (2021)
Creates a general framework for variants of PFA
Some items have multiple skills
Learning likely to be gradual rather than sudden
Relatively small amounts of data
You want to add new items without refitting the model