from pyBKT.models import Model
Module 1: Case Study
Bayesian Knowledge Tracing
1. Prepare
The first KT case study is inspired by the work of Zambrano, Zhang, and Baker (2024), which analyzed the performance of the Bayesian Knowledge Tracing(BKT) model and carelessness detector on every demographic group in the sample.
The primary aim of this case study is to gain some hands-on experience with essential Python packages and functions for Bayesian Knowledge Tracing. You will learn how to do the data wrangling, fit the model, and analyze the goodness of the model. Zambrano, Zhang, and Baker (2024) utilized BKT brute-force grid search(BKT-BF, in Java) to fit the BKT model but you will use pyBKT(Python) here. pyBKT is easier to start with but slower in performance. Specifically, this case study will cover the following topics:
Prepare: Before analysis, you’ll read a recent paper about BKT, learn about the current trend, and get introduced to the {pandas}, {sklearn}, and {pyBKT} packages for data wrangling and analyzing the BKT model.
Fitting the model: In the fitting section of the case study, you will learn basic techniques for fitting and evaluating a BKT model.
Advanced Features: You will explore a variant of the BKT model and an advanced feature called Roster in the pyBKT package.
1a. Review the Research
In this study, Zambrano, Zhang, and Baker (2024) assessed the degree to which algorithmic biases are present in two learning analytics models: knowledge estimates based on Bayesian Knowledge Tracing (BKT) and carelessness detectors. Specifically, this analysis evaluated the model performance across demographic groups, compared performance across intersectional groups of these demographics, and explored models’ transferability across unobserved demographics. Results show close to equal performance across these groups. Thus, these algorithms can be used fairly at scale.
Research Questions
The central goal of this research is to
investigate the degree to which algorithmic biases are present in two learning analytics models: Bayesian Knowledge Tracing (BKT) and carelessness detectors.
Data Collection
The data is from 5,856 students across 12 middle and high schools in a northeastern US city. The students used Carnegie Learning’s MATHia Ritter et al. (2007) software for math instruction during the 2021-2022 academic years. The content includes multi-step questions, guiding students through predetermined content sequences. MATHia’s structure closely aligns with the Bayesian Knowledge Tracing (BKT) algorithm.
Analysis
The knowledge estimate for specific skills was calculated using BKT. The authors fitted BKT parameters with brute-force grid search. Upper limits of 0.3 and 0.1 for the ’Guess’ and ’Slip’ parameters were adopted respectively to avoid model degeneracy and ensure the parameter values are aligned with conceptual meaning. Demographic characteristics are not taken into calculation directly when building the BKT model. However, sample sizes are not even across demographic groups so the parameters could be more representative of demographics with a larger number of students.
The authors adopted a 4-fold student level cross-validation that was stratified by demographics and evaluated the model performance with AUC ROC. The max difference between AUC for the best and worst predicted group was also calculated.
Key Findings
As reported by Zambrano, Zhang, and Baker (2024) in their findings section:
We found evidence that performance was close to equal across demographic groups, for these models, including intersectional categories, and tests where we held out entire demographic groups during model training (a test of model applicability to entirely new demographic groups), for carelessness.
❓Question
Based on what you know about BKT and the context so far, what other research question(s) might you ask in this context that a knowledge inference perspective might be able to answer?
Type a brief response in the space below:
1b. Load Packages
In this case study, you will not replicate the data analysis in Zambrano, Zhang, and Baker (2024). Instead, you will use the data from Baker et al. (2008) to learn how to fit a BKT model. First, you will learn about the essential packages you will be using in this case study.
Packages, sometimes called libraries, are shareable collections of Python code that can contain functions, data, and/or documentation and extend the functionality of Python. You can always check to see which Python packages that are not dependencies of other packages have already been installed and loaded into RStudio Cloud using the command pip list
in the terminal.
pyBKT 📦
The {pyBKT} package (Badrinath, Wang, and Pardos 2021) is a Python implementation of the Bayesian Knowledge Tracing algorithm and variants, estimating student cognitive mastery from problem-solving sequences (Badrinath, Wang, and Pardos 2021).
Click the green arrow in the right corner of the “code chunk” that follows to load the {pyBKT} library.
pandas 📦
One package that you’ll be using extensively is {pandas}. Pandas (McKinney 2010) is a powerful and flexible open-source data analysis and wrangling tool for Python. Python is also used widely by the data science community.
Click the green arrow in the right corner of the “code chunk” that follows to load the {pandas} library introduced in LA Workflow labs.
import pandas as pd
Numpy📦
NumPy (pronounced /ˈnʌmpaɪ/ NUM-py) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Click the green arrow in the right corner of the “code chunk” that follows to load numpy
:
import numpy as np
👉 Your Turn ⤵
Use the code chunk below to import the pyBKT package:
# Your code starts here
2. Fitting the model
This is an example of model fitting:
In this example, linear regression is used to fit the model and this is one of the simplest techniques. You are fitting the model (the line) to a dataset (the dots). The model will be on the form y = a x + b, and you’re trying to find the optimal values of a and b. You draw a line that best fits the existing data points on average. Once you’ve fitted the model, you can use it to predict outcomes (y-axis) based on inputs (x-axis).
2a. Import the dataset
To realize the goals, you’ll need to first import the CSV files originally obtained from Baker et al. (2008) This data set is a subset of the data set used in Baker et al. (2008). A description of each file is below along with a link to the original file:
Example CSV dataset: This BKT dataset consists of 298 students’ performance in 67 skills.
Dataset description: This file includes the descriptions of all the variables in this dataset.
Let’s use the read_csv
function from the {pandas} package to import the AsgnBA3-dataset.csv
.
= pd.read_csv("data/AsgnBA3-dataset.csv")
df df.head()
ID | Lesson | Student | KC | item | right | firstattempt | time | |
---|---|---|---|---|---|---|---|---|
0 | 0 | Splot | AGUFADE | VALUING-CAT-FEATURES | META-VALUING-CAT-FEATURES-1 | 1 | 1 | 3.29700 |
1 | 1 | Splot | AGUFADE | VALUING-NUM-FEATURES | META-VALUING-NUM-FEATURES-1 | 0 | 1 | 4.04700 |
2 | 2 | Splot | AGUFADE | CHOOSE-VAR-TYPE | CHOOSE-VAR-TYPE-NUM-1 | 1 | 1 | 1.59300 |
3 | 3 | Splot | AGUFADE | VALUING-NUM-FEATURES | META-VALUING-NUM-FEATURES-1 | 0 | 0 | 2.92200 |
4 | 4 | Splot | AGUFADE | CHOOSE-VAR-TYPE | CHOOSE-VAR-TYPE-NUM-2 | 1 | 1 | 1.59400 |
df
here stands for “DataFrame” in the Pandas package. A Pandas DataFrame is a two-dimensional data structure, like a two-dimensional array, or a table with rows and columns.
Remove unnecessary rows
Before moving to the next step, use the code chunk below to filter out all the rows in which the firstattempt equals 1. Then filter out all the rows that represent the “CHOOSE-X-AXIS-QUANTITATIVE” skill.
#just an example, this will not show in students' version
= df[df["firstattempt"]==1]
df2 = df2[df2["KC"]=="CHOOSE-X-AXIS-QUANTITATIVE"] df3
2b. Fit the model
Mapping the column names
The accepted input formats in pyBKT are Pandas DataFrames and data files of type CSV (comma separated) or TSV (tab separated). pyBKT will automatically infer which delimiter to use in the case that it is passed a data file. Since column names mapping meaning to each field in the data (i.e. skill name, correct/incorrect) varies per data source, you may need to specify a mapping from your data file’s column names to pyBKT’s expected column names.
Thus, you will need to create a column name mapping before training the model.
= {'order_id': 'ID', 'skill_name': 'KC', 'correct': 'right', 'user_id': 'Student'} defaults
This is a dictionary. Dictionaries are used to store data values in key: value pairs. Dictionaries are written with curly brackets.
The column names you need to specify are order_id, skill_name, correct, and user_id. You may refer to the document in 2a about the descriptions of all the variables.
Fit the model
pyBKT makes fitting the model very easy. It only takes 2 lines:
= Model(seed = 42)
model = df3, defaults = defaults)
model.fit(data print(model.params())
value
skill param class
CHOOSE-X-AXIS-QUANTITATIVE prior default 0.50016
learns default 0.44768
guesses default 0.21683
slips default 0.15422
forgets default 0.00000
First, use the Model
function of the pyBKT package to create a BKT model.
The seed
parameter is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number.
Then, use the fit
method and input df3
and defaults
, the dataset you have cleaned so far, and the column name mapping.
Let’s run the code chunk below to see what are the best parameters.
print(model.params())
value
skill param class
CHOOSE-X-AXIS-QUANTITATIVE prior default 0.50016
learns default 0.44768
guesses default 0.21683
slips default 0.15422
forgets default 0.00000
RMSE and AUC
pyBKT provides various ways to evaluate your BKT model, such as RMSE, and AUC.
= model.evaluate(data = df3)
training_rmse = model.evaluate(data= df3, metric = 'auc')
training_auc
print("Training RMSE: %f" % training_rmse)
print("Training AUC: %f" % training_auc)
Training RMSE: 0.446825
Training AUC: 0.653699
The Root Mean Squared Error (RMSE) is one of the two main performance indicators for a regression model. It measures the average difference between values predicted by a model and the actual values. To put it simply, the lower, the better.
AUC ROC is the area under the ROC curve. (baker2024fixingEach?) point out that the ROC curve shows the trade-off between the sensitivity and specificity of the model. When expanded to the entire curve, it shows this trade-off across all possible thresholds.
Create your metrics
You can even create your metrics, such as the sum of squared residuals (SSR), used by BKT-BF Baker et al. (2010).
def SSR(true_vals, pred_vals):
return np.sum(np.square(true_vals - pred_vals))
= model.evaluate(data= df3, metric = SSR)
training_SSR print("Training SSR: %f" % training_SSR)
Training SSR: 166.111049
The sum of squared residuals (SSR) measures the level of variance in the error term, or residuals, of a regression model. The smaller the residual sum of squares, the better your model fits your data; the greater the residual sum of squares, the poorer your model fits your data.
❓Challenge
Split the data into the training set and the testing set by 80%/20% and fit your BKT model on a skill. This skill should not be the same as the one above and the one in the ASSISTments activity. Then, evaluate the MAE and AUC by predicting the given test set and training set in the respective variables.
#split the dataset by 80%/20%
3. Conditionalizing and Cross-validation in BKT
3a. Conditionalize guess, slip, and learn on other factors
You can also conditionalize guess, slip, or learn on other factors in the BKT model. You need to provide guess/slip/learn classes to use in fitting the model. Let’s say you are going to fit each item’s priors, learn rate, guess, and slip model with the data. You will need to first specify which column is the class.
Use the code chunk below to test this variant:
= {'order_id': 'ID', 'skill_name': 'KC', 'correct': 'right', 'user_id': 'Student', 'multigs': 'item', 'multilearn':'item'}
defaults_multi = Model(seed=42, num_fits = 1)
model_multi
= df3, multilearn = True, multigs= True, defaults = defaults_multi)
model_multi.fit(dataprint(model_multi.params())
value
skill param class
CHOOSE-X-AXIS-QUANTITATIVE prior default 0.80665
learns CHOOSE-X-AXIS-QUANTITATIVE-1 0.71828
CHOOSE-X-AXIS-QUANTITATIVE-10 0.45880
CHOOSE-X-AXIS-QUANTITATIVE-11 1.00000
CHOOSE-X-AXIS-QUANTITATIVE-2 0.59323
CHOOSE-X-AXIS-QUANTITATIVE-3 0.58896
CHOOSE-X-AXIS-QUANTITATIVE-4 0.58888
CHOOSE-X-AXIS-QUANTITATIVE-5 0.92769
CHOOSE-X-AXIS-QUANTITATIVE-6 0.45929
CHOOSE-X-AXIS-QUANTITATIVE-7 0.37564
CHOOSE-X-AXIS-QUANTITATIVE-8 0.46272
CHOOSE-X-AXIS-QUANTITATIVE-9 0.88974
guesses CHOOSE-X-AXIS-QUANTITATIVE-1 0.06186
CHOOSE-X-AXIS-QUANTITATIVE-10 0.61002
CHOOSE-X-AXIS-QUANTITATIVE-11 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-2 0.08702
CHOOSE-X-AXIS-QUANTITATIVE-3 0.03506
CHOOSE-X-AXIS-QUANTITATIVE-4 0.03049
CHOOSE-X-AXIS-QUANTITATIVE-5 0.02153
CHOOSE-X-AXIS-QUANTITATIVE-6 0.99353
CHOOSE-X-AXIS-QUANTITATIVE-7 0.00147
CHOOSE-X-AXIS-QUANTITATIVE-8 0.00285
CHOOSE-X-AXIS-QUANTITATIVE-9 0.00000
slips CHOOSE-X-AXIS-QUANTITATIVE-1 0.37164
CHOOSE-X-AXIS-QUANTITATIVE-10 0.50000
CHOOSE-X-AXIS-QUANTITATIVE-11 1.00000
CHOOSE-X-AXIS-QUANTITATIVE-2 0.21020
CHOOSE-X-AXIS-QUANTITATIVE-3 0.23873
CHOOSE-X-AXIS-QUANTITATIVE-4 0.20004
CHOOSE-X-AXIS-QUANTITATIVE-5 0.16110
CHOOSE-X-AXIS-QUANTITATIVE-6 0.16004
CHOOSE-X-AXIS-QUANTITATIVE-7 0.57133
CHOOSE-X-AXIS-QUANTITATIVE-8 0.62499
CHOOSE-X-AXIS-QUANTITATIVE-9 1.00000
forgets CHOOSE-X-AXIS-QUANTITATIVE-1 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-10 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-11 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-2 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-3 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-4 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-5 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-6 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-7 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-8 0.00000
CHOOSE-X-AXIS-QUANTITATIVE-9 0.00000
As you can see from the output, each item now has the guess, slip, and learn rate. You can conditionalize on lots of factors, depending on the dataset you collected.
❓Challenge: Conditionalize slip on response time
Another variant of BKT is to Conditionalize slip on whether the time taken was under 5 seconds. Thus, you will have two sets of the 4 classical parameters: one set is for those actions that took more than 5 seconds. The other set is for those actions that take less than 5 seconds. Please notice that this is not exactly great practice, but feasible with the data set you have.
The current dataset only has response time. You will need to create another binary column for this challenge.
Build a Column on whether the students took more than 5 seconds
#This part is just an example and the code will not show to the participants
=df3
df48, 'FiveSecs', 0)
df4.insert(def cal_5secs(row):
if row['time'] > 5:
return 1
else:
return 0
"FiveSecs"] = df4.apply(cal_5secs, axis=1) df4[
Build your 5 seconds model
#This part is just an example and the code will not show to the participants
= {'order_id': 'ID', 'skill_name': 'KC', 'correct': 'right', 'user_id': 'Student', 'multigs': 'FiveSecs'}
defaults = Model(seed = 42, num_fits = 1)
model_5sec = df4, multigs = True,defaults = defaults)
model_5sec.fit(dataprint(model_5sec.params())
value
skill param class
CHOOSE-X-AXIS-QUANTITATIVE prior default 0.99387
learns default 0.97018
guesses 0 0.00000
1 0.00001
slips 0 0.63572
1 0.23534
forgets default 0.00000
3b Cross Validation
pyBKT also allows you to easily use cross-validation.
Cross-validation is a technique to evaluate the performance of a model on unseen data. The picture below shows how a 5-fold cross-validation works.
Cross-validation is offered as a blackbox function similar to a combination of fit and evaluate that accepts a particular number of folds, a seed, and a metric(either one of the 3 provided that are ‘rmse’, ‘auc’ or ‘accuracy’ or a custom Python function)
= Model(seed = 42, num_fits=1)
model2 = model2.crossvalidate(data = df3, folds = 10, defaults = defaults, metric = 'auc')
cross_vali print(cross_vali)
auc
skill
CHOOSE-X-AXIS-QUANTITATIVE 0.62969
❓Challenge: Cross-validation on your response time model
Please use the code chunk below to conduct a 10-fold cross-validation on your response time model and answer the following question:
Is the model better when conditionalizing slip on response time? Why?
# It is just an example. It will not show in students' version
= model_5sec.crossvalidate(data = df4, folds = 10, defaults = defaults, metric = 'auc', multigs= True)
cross_validation print(cross_validation)
auc
skill
CHOOSE-X-AXIS-QUANTITATIVE 0.74675
4. Other advanced features in pyBKT
pyBKT also offers some other advanced features, such as Roster and Parameter Fixing.
4a. Roster
Roster is used to simulate the learning environment for a group of students learning any combination of individual skills.
You need to first create a backend pyBKT model and fit it on the dataset.
= {'order_id': 'ID', 'skill_name': 'KC', 'correct': 'right', 'user_id': 'Student'}
defaults_roster = Model()
model_roster = df2, defaults = defaults_roster ) model_roster.fit(data
Then you can use the Roster
to create a roster with two students and one skill.
from pyBKT.models import *
= Roster(students = ['Jack', 'Rachel'], skills = "VALUING-NUM-FEATURES", model = model_roster) roster
You can update Rachel’s status by adding one or more responses to a particular skill. In this case, Rachel correctly answered one question. Then check Rachel’s updated mastery state and probability.
= roster.update_state('VALUING-NUM-FEATURES', 'Rachel', 1)
rachel_new_state print("Rachel's mastery:", roster.get_state_type('VALUING-NUM-FEATURES', 'Rachel'))
print("Rachel's probability of mastery:", roster.get_mastery_prob('VALUING-NUM-FEATURES', 'Rachel'))
Rachel's mastery: StateType.UNMASTERED
Rachel's probability of mastery: 0.5750390776035513
❓Challenge
Create a new roster on the model you fitted in section 2. If you add 3 consecutive correct answers to Rachel and 3 consecutive incorrect answers to Jack, will they master the skill or not? Use the code chunk below:
4b. Parameter Fixing
Another advanced feature supported by pyBKT is parameter fixing, where you can fix one or more parameters and train the model conditioned on those fixed parameters. For example, you could fix the slip rate to 0.2
= Model()
model_fixedparam = {'order_id': 'ID', 'skill_name': 'KC', 'correct': 'right', 'user_id': 'Student'}
defaults = {'CHOOSE-X-AXIS-QUANTITATIVE': {'slips': np.array([0.2])}}
model_fixedparam.coef_ = df3, fixed=True, defaults = defaults)
model_fixedparam.fit(data model_fixedparam.params()
value | |||
---|---|---|---|
skill | param | class | |
CHOOSE-X-AXIS-QUANTITATIVE | prior | default | 0.68396 |
learns | default | 0.51380 | |
guesses | default | 0.02199 | |
slips | default | 0.20000 | |
forgets | default | 0.00000 |
❓Challenge
Fix the slip rate to 0.3 and the guess rate to 0.2 when fitting your model, is the model getting better or worse?