Big Data Technologies and Applications. 10th EAI International Conference, BDTA 2020, and 13th EAI International Conference on Wireless Internet, WiCON 2020, Virtual Event, December 11, 2020, Proceedings

Research Article

Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences

  • @INPROCEEDINGS{10.1007/978-3-030-72802-1_2,
        author={Shuai Yuan and Huan Huang and Tingting He and Rui Hou},
        title={Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences},
        proceedings={Big Data Technologies and Applications. 10th EAI International Conference, BDTA 2020, and 13th EAI International Conference on Wireless Internet, WiCON 2020, Virtual Event, December 11, 2020, Proceedings},
        proceedings_a={BDTA \& WICON},
        year={2021},
        month={7},
        keywords={Early detecting The prediction of learning result Long short term memory},
        doi={10.1007/978-3-030-72802-1_2}
    }
    
  • Shuai Yuan
    Huan Huang
    Tingting He
    Rui Hou
    Year: 2021
    Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences
    BDTA & WICON
    Springer
    DOI: 10.1007/978-3-030-72802-1_2
Shuai Yuan1, Huan Huang2, Tingting He1, Rui Hou2
  • 1: Central China Normal University
  • 2: South-Central University for Nationalities

Abstract

Online learning has developed rapidly, but the participation of learners is very low. So it is of great significance to construct a prediction model of learning results, to identify students at risk in time and accurately. We select nine online learning behaviors from one course in Moodle, take one week as the basic unit and 5 weeks as the time node of learning behavior, and the aggregate data and sequence data of the first 5 weeks, the first 10 weeks, the first 15 weeks, the first 20 weeks, the first 25 weeks, the first 30 weeks, the first 35 weeks and the first 39 weeks are formed. Eight classic machine learning methods, i.e. Logistic Regression (LR), Naive Bayes (NB), Radom Forest (RF), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Iterative Dichotomiser3 (ID3), Classification and Regression Trees (CART), and Neural Network (NN), are used to predict the learning results in different time nodes based on aggregate data and sequence data. The experimental results show that sequence data is more effective than aggregate data to predict learning results. The prediction AUC of RF model on sequence data is 0.77 at the lowest and 0.83 at the highest, the prediction AUC of CART model on sequence data is 0.70 at the lowest and 0.83 at the highest, which are the best models of the eight classic prediction models. Then Radom Forest (RF) model, Classification and Regression Trees (CART) model, recurrent neural network (RNN) model and long short term memory (LSTM) model are used to predict learning results on sequence data; the experimental results show that long short term memory (LSTM) is a model with the highest value of AUC and stable growth based on sequence data, and it is the best model of all models for predicting learning results.