pyBKT

Python implementation of the Bayesian Knowledge Tracing algorithm and variants, estimating student cognitive mastery from problem solving sequences.

    pip install pyBKT

Based on the work of Zachary A. Pardos (zp@berkeley.edu) and Matthew J. Johnson (mattjj@csail.mit.edu) @ https://github.com/CAHLR/xBKT. All-platform python adaptation and optimizations by Anirudhan Badrinath (abadrinath@berkeley.edu). Data helpers and other utility functions written by Frederic Wang (fredwang@berkeley.edu). Original Python and boost adaptation of xBKT by Cristian Garay (c.garay@berkeley.edu). For implementation details, analysis of runtime and data requirements, and model variant replication testing, refer to:

Badrinath, A., Wang, F., Pardos, Z.A. (2021) pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models. In S. Hsiao, & S. Sahebi (Eds.) Proceedings of the 14th International Conference on Educational Data Mining (EDM). Pages 468-474.

Bulut, O., Shin, J., Yildirim-Erbasli, S. N., Gorgun, G., & Pardos, Z. A. (2023). An introduction to Bayesian knowledge tracing with pyBKT. Psych, 5(3), 770-786.

Examples from the paper can be found in pyBKT-examples repo.

pyBKT Quick Start Tutorial

pyBKT Tutorial from LAK Workshop in Google Colab Notebook

Requirements

Python >= 3.5

Supported OS: All platforms! (Yes, Windows too)

Supported model variants

pyBKT can be used to define and fit many BKT variants, including these from the literature:

Individual student priors, learn rate, guess, and slip [1,2]
Individual item guess and slip [3,4,5]
Individual item or resource learn rate [4,5]

Pardos, Z. A., Heffernan, N. T. (2010) Modeling Individualization in a Bayesian Networks Implementation of Knowledge Tracing. In P. De Bra, A. Kobsa, D. Chin (Eds.) Proceedings of the 18th International Conference on User Modeling, Adaptation and Personalization (UMAP). Big Island of Hawaii. Pages. Springer. Pages 255-266. [doi]
Pardos, Z. A., Heffernan, N. T. (2010) Using HMMs and bagged decision trees to leverage rich features of user and skill from an intelligent tutoring system dataset. In J. Stamper & A. Niculescu-Mizil (Eds.) Proceedings of the KDD Cup Workshop at the 16th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD). Washington, D.C. ACM. Pages 24-35. [kdd cup]
Pardos, Z. & Heffernan, N. (2011) KT-IDEM: Introducing Item Difficulty to the Knowledge Tracing Model. In Konstant et al. (eds.) Proceedings of the 20th International Conference on User Modeling, Adaptation and Personalization (UMAP). Girona, Spain. Springer. Pages 243-254. [doi]
Pardos, Z. A., Bergner, Y., Seaton, D., Pritchard, D.E. (2013) Adapting Bayesian Knowledge Tracing to a Massive Open Online College Course in edX. In S.K. D’Mello, R.A. Calvo, & A. Olney (Eds.) Proceedings of the 6th International Conference on Educational Data Mining (EDM). Memphis, TN. Pages 137-144. [edm]
Xu, Y., Johnson, M. J., Pardos, Z. A. (2015) Scaling cognitive modeling to massive open environments. In Proceedings of the Workshop on Machine Learning for Education at the 32nd International Conference on Machine Learning (ICML). Lille, France. [icml ml4ed]

Installation and setup

This is intended as a quick overview of steps to install and setup and to run pyBKT locally.

We offer both a pure Python port and a Python/C++ extension version of pyBKT for the sake of accessibility and ease of use on any platform. Note that pip, by default, will install the C++/Python version unless the required libraries are not found or there is an error during installation. In the case of such issues, it will revert to the pure Python implementation.

The former pure Python versions does not fit models or scale as quickly or efficiently as the latter (due to nested for loops needed for DP). Here are a few speed comparisons - both on the same machine - that may be useful in deciding which version is more appropriate given the usage (e.g. model fitting is far more demanding than prediction).

| Test Description | pyBKT (Python) | pyBKT (C++) | |:-----------------------------------------------:|:--------------:|---------------:| | synthetic data, model fit (500 students) | ~1m55s | ~1.5s | | synthetic data, model fit (5000 students) | ~1h30m | ~45s | | cross validated cognitive tutor data | ~4m10s | ~3s | | synthetic data, predict onestep (500 students) | ~2s | ~0.8s | | synthetic data, predict onestep (5000 students) | ~2m15s | ~35s |

Installing Dependencies for Fast C++ Inferencing (Optional - for OS X and Linux)

Note: this section is not applicable for Windows as running the Python/C++ version is cumbersome and untested. For Windows, we only offer the slower, pure Python version of pyBKT (it will be installed automatically).

Linux

If you have a C++ compiler already installed, pip will install pyBKT with fast C++ inferencing. C++ compilers are already installed on nearly all Linux distributions. If it is not installed on your machine, type sudo apt install gcc g++ if using Debian based distributions. Otherwise, whichever package manager is appropriately suited to your distribution (dnf, pacman, etc.). Without a compiler, pip will install pyBKT without C++ speed optimizations.

Mac

The latest version of Python is necessary for OS X. If homebrew is installed, run the following commands to download the necessary dependencies:

    brew install libomp

Installing pyBKT

You can simply run:

    pip install pyBKT

Alternatively, if pip poses some problems, you can clone the repository as such and then run the setup.py script manually.

    git clone https://github.com/CAHLR/pyBKT.git
    cd pyBKT
    python3 setup.py install

Preparing Data and Running Model

The following serves as a mini-tutorial for how to get started with pyBKT. There is more information available at the Colab notebook listed at the top of the README.

Input and Output Data

The accepted input formats are Pandas DataFrames and data files of type csv (comma separated) or tsv (tab separated). pyBKT will automatically infer which delimiter to use in the case that it is passed a data file. Since column names mapping meaning to each field in the data (i.e. skill name, correct/incorrect) vary per data source, you may need to specify a mapping from your data file's column names to pyBKT's expected column names. In many cases with Cognitive Tutor and Assistments datasets, pyBKT will be able to automatically infer column name mappings, but in the case that it is unable to, it will raise an exception. Note that the correctness is given by -1 (no response), 0 (incorrect), or 1 (correct).

Creating and Training Models

The process of creating and training models in pyBKT resemble that of SciKit Learn. pyBKT provides easy methods of fetching online datasets and to fit on a combination or all skills available in any particular dataset.

from pyBKT.models import Model

# Initialize the model with an optional seed
model = Model(seed = 42, num_fits = 1)

# Fetch Assistments and CognitiveTutor data (optional - if you have your own dataset, that's fine too!)
model.fetch_dataset('https://raw.githubusercontent.com/CAHLR/pyBKT-examples/master/data/as.csv', '.')
model.fetch_dataset('https://raw.githubusercontent.com/CAHLR/pyBKT-examples/master/data/ct.csv', '.')

# Train a simple BKT model on all skills in the CT dataset
model.fit(data_path = 'ct.csv')

# Train a simple BKT model on one skill in the CT dataset
# Note that calling fit deletes any previous trained BKT model!
model.fit(data_path = 'ct.csv', skills = "Plot imperfect radical")

# Train a simple BKT model on multiple skills in the CT dataset
model.fit(data_path = 'ct.csv', skills = ["Plot imperfect radical",
                                          "Plot pi"])

# Train a multiguess and slip BKT model on multiple skills in the
# CT dataset. Note: if you are not using CognitiveTutor or Assistments
# data, you may need to provide a column mapping for the guess/slip
# classes to use (i.e. if the column name is gsclasses, you would
# specify multigs = 'gsclasses' or specify a defaults dictionary
# defaults = {'multigs': 'gsclasses'}).
model.fit(data_path = 'ct.csv', skills = ["Plot imperfect radical",
                                          "Plot pi"],
                                multigs = True)

# We can combine multiple model variants.
model.fit(data_path = 'ct.csv', skills = ["Plot imperfect radical",
                                          "Plot pi"],
                                multigs = True, forgets = True,
                                multilearn = True)

# We can use a different column to specify the different learn and 
# forget classes. In this case, we use student ID.
model.fit(data_path = 'ct.csv', skills = ["Plot imperfect radical",
                                          "Plot pi"],
                                multigs = True, forgets = True,
                                multilearn = 'Anon Student Id')

# View the trained parameters!
print(model.params())

Note t

PyBKT

Install / Use

README