SkillAgentSearch skills...

Ednet

EdNet is the dataset of all student-system interactions collected over 2 years by Santa, a multi-platform AI tutoring service with more than 780K users in Korea available through Android, iOS and web.

Install / Use

/learn @riiid/Ednet
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

EdNet

Paper : https://arxiv.org/abs/1912.03072

Leaderboard : Link

EdNet is the dataset of all student-system interactions collected over 2 years by Santa, a multi-platform AI tutoring service with more than 780K users in Korea available through Android, iOS and web.

Properties of EdNet

EdNet dataset contains various features of student actions such as which learning material he have consumed, response, how much time he have spent for solving a given question or reading through expert’s commentary. And EdNet have some properties which is introduced following.

1. Large scale

EdNet is composed of a total of 131,441,538 interactions collected from 784,309 students of Santa since 2017. Each student has generated 441.20 interactions while using Santa on average. EdNet, based on those interactions, makes researchers possible to access to a large-scale real-world ITS data. Moreover, Santa provides a total 13,169 problems and 1,021 lectures tagged with 293 types of skills, and each of them has been consumed 95,294,926 times and 601,805 times, respectively. To the best of our knowledge, this is the largest dataset in education available to the public in terms of the total number of students, interactions, and interaction types.

2. Diversity

EdNet offers the most diverse set of interactions among all existing ITS data. The set of behaviors directly related to learning is also richer than other datasets, as EdNet includes learning activities such as reading explanations and watching lectures not provided by others. Such diversity enables researchers to analyze students from various perspectives. For example, purchasing logs may help to analyze student's engagement for learning. Also, contents information table is provided separately.

3. Hierarchy

EdNet has a hierarchical structure of different data points. To provide various kinds of actions in a consistent and organized manner, EdNet offers the datasets in four different levels each named KT1, KT2, KT3 and KT4. As the level of the dataset increases, the number of actions and types of actions involved also increase. The details and descriptions of each dataset is described below.

4. Multi-platform

In the age where students have access to various devices spanning from personal computers to smartphones and AI speakers, it is inevitable for ITSs to offer the access from multiple platforms. Accordingly, Santa is a multi-platform system available in iOS, Android and Web and EdNet contains data points gathered from both mobile and desktop.This allows the study of AIEd models suited for future multi-platform ITSs, utilizing the data collected from different platforms in a consistent manner.

Dataset

As we said, there are four datasets named KT1, KT2, KT3, and KT4 with different extents. Here's common features of these datasets:

  • The whole dataset is divided by students: {user_id}.csv only contains {user_id}'s interactions.
  • The timestamps are different from the real values, which are modified (shifted by fixed values) due to security issues.

Download links

KT1

Download a .zip file from bit.ly/ednet_kt1

|Specification| | |----|---| | Size of the compressed file | 1.2GB | | Size of the uncompressed file | 5.6GB | | The number of files | 784,309 |

Structure

KT1 consists of students' question-solving logs, which is the most basic and fundamental information that can be used by various deep-learning knowledge tracing models such as Deep Knowledge Tracing and Self-Attentive Knowledge Tracing. EdNet-KT1 is the record of Santa collected since Apr 18. 2017 following this question-response sequence format. A major property of EdNet is that the questions come in bundles. That is, a collection of questions sharing a common passage, picture or listening material. For example, questions of ID q2319, q2320 and q2321 may share the same reading passage. In this case, the questions are said to form a bundle and will be given to the student with corresponding shared material. When a bundle is given, a student have access to all the problems and has to respond all of them in order to complete the bundle.

Description

  • timestamp is the moment the question was given, represented as Unix timestamp in milliseconds.
  • solving_id represents each learning session of students corresponds to each bunle. It is a form of single integer, starting from 1.
  • question_id is the ID of the question that given to student, which is a form of q{integer}.
  • user_answer is the answer that the student submitted, recorded as a character between a and d inclusively.
  • elapsed_time is the time that the students spends on each question in milliseconds.

Example

| timestamp | question_id | bundle_id | user_answer | elapsed_time | |---------------|--------------|------------|--------------|---------------| |1548996377530 | 48 | q2844 | d |47000 | |1548996378149 | 48 | q2845 | d | 47000 | |1548996378665 | 48 | q2846 | d | 47000 | |1548996671661 | 49 | q4353 | c | 67000 | |1548996787866 | 50 | q3944 | a | 54000 |

KT2

Download a .zip file from bit.ly/ednet-kt2

|Specification| | |----|---| | Size of the compressed file | 0.6GB(555.8MB) | | Size of the uncompressed file | 3.1GB | | The number of files | 297,444 |

Structure

A major drawback of the question-response sequence format is that it cannot account for the inherent heterogeneity of students' actions. For example, a student may alternately select one of two answer choices before submitting his final answer, which possibly signals that he is unsure of either of the options. Due to the restriction of question-response format, a dataset following such format like EdNet-KT1 can't effectively represent such situation. To overcome this limitation, Santa have collected the full behavior of students since Aug. 27, 2018. As a result, the datasets EdNet-KT2, EdNet-KT3 and EdNet-KT4 of action sequences of each user are compiled. Each action represents a single unit of behavior made by a student in the Santa UI, such as watching a video lecture, choosing a response option, or reading a passage. By recording a student's behavior as-is, the datasets represent each student's behavior more accurately and allows AIEd models to incorporate finer details of learning history. EdNet-KT2, the simplest action-based dataset of EdNet, consists of the actions related to question-solving activities. Note that the features of KT1 can be fully recovered by the columns of KT2, and KT2 contains further information such as the study mode of student or the intermediate responses provided by student.

Description

  • action_type is one of the following: enter, respond, and submit.

    • enter is recorded when student first receives and views a question bundle through UI.
    • respond is recorded when the student selects an answer choice to one of the questions in the bundle. A student can respond to the same question multiple times. In this case, only the last response before submitting his final answer is considered as his response.
    • submit is recorded when the student submits his final answers to the the given bundle.
  • item_id is The ID of item involved with the action. For EdNet-KT2, only the IDs of questions and bundles are recorded. A bundle is assigned for actions of type enter and submit.

  • source shows where the student solve a question or watch a lecture in Santa UI. There are several sources in Santa that students can solve questions or watch lectures. For KT2, only the sources that provides question-solving environments are recorded.

    • In sprint, students choose a part that they want to study. After that, they can only solve questions belongs to the part that they choose, until they change to different part or select different source.

    • For each day, Santa recommends questions and lectures based on each student's current knowledge status, i.e. correctness probabilities predicted by the Collaborative Filtering model. Such source is called Today's Recommendation. Questions that belong to particular parts can be recommended, todays_recommendation::sprint, todays_recommendation::review_quiz

    • Once the number of incorrect answers to questions with particular tags exceeds certain threshold, Santa suggests lectures and questions with corresponding tags. Such suggestion is recorded as adaptive_offer. It also offers lectures and questions if the average correctness rate of questions with particular tags decreased by more than a certain threshold.

    • All Parts is a source that students solve questions that Santa recommends following a certain algorithm, from all possible candidates. This is recorded as tutor.

    • The student can re-do the questions that he already solved before using review system, which is recorded as in_review.

  • user_answer is recorded when action_type is respond, which stands for the student's submitted answer. It is one of the alphabets a, b, c, and d.

  • platform shows where the student used Santa, which is either mobile or web.

Example

| timestamp | action_type | item_id | source | user_answer | platform |

View on GitHub
GitHub Stars366
CategoryDevelopment
Updated5h ago
Forks62

Security Score

80/100

Audited on Mar 29, 2026

No findings