
Research Article
BiDKT: Deep Knowledge Tracing with BERT
@INPROCEEDINGS{10.1007/978-3-030-98005-4_19, author={Weicong Tan and Yuan Jin and Ming Liu and He Zhang}, title={BiDKT: Deep Knowledge Tracing with BERT}, proceedings={Ad Hoc Networks and Tools for IT. 13th EAI International Conference, ADHOCNETS 2021, Virtual Event, December 6--7, 2021, and 16th EAI International Conference, TRIDENTCOM 2021, Virtual Event, November 24, 2021, Proceedings}, proceedings_a={ADHOCNETS \& TRIDENTCOM}, year={2022}, month={3}, keywords={Educational data mining Knowledge tracing BERT}, doi={10.1007/978-3-030-98005-4_19} }
- Weicong Tan
Yuan Jin
Ming Liu
He Zhang
Year: 2022
BiDKT: Deep Knowledge Tracing with BERT
ADHOCNETS & TRIDENTCOM
Springer
DOI: 10.1007/978-3-030-98005-4_19
Abstract
Deep knowledge Tracing is a family of deep learning models that aim to predict students’ future correctness of responses for different subjects (to indicate whether they have mastered the subjects) based on their previous histories of interactions with the subjects. Early deep knowledge tracing models mostly rely on recurrent neural networks (RNNs) that can only learn from a uni-directional context from the response sequences during the model training. An alternative for learning from the context in both directions from those sequences is to use the bidirectional deep learning models. The most recent significant advance in this regard is BERT, a transformer-style bidirectional model, which has outperformed numerous RNN models on several NLP tasks. Therefore, we apply and adapt the BERT model to the deep knowledge tracing task, for which we propose the modelBiDKT. It is trained under amasked correctness recoverytask where the model predicts the correctness of a small percentage of randomly masked responses based on their bidirectional context in the sequences. We conducted experiments on several real-world knowledge tracing datasets and show that BiDKT can outperform some of the state-of-the-art approaches on predicting the correctness of future student responses for some of the datasets. We have also discussed the possible reasons why BiDKT has underperformed in certain scenarios. Finally, we study the impacts of several key components of BiDKT on its performance.