DuolingoSLAM
2018 Duolingo Shared Task on Second Language Acquisition Modeling (SLAM) (http://sharedtask.duolingo.com/)
Install / Use
/learn @NYUCCL/DuolingoSLAMREADME
Duolingo Shared Task on Second Language Acquisition Modeling
This repository contains code for running the 2nd place (Spanish-to-English) and 3rd place (English-to-Spanish and French-to-English) model in the Duolingo SLAM competition. The paper describing our approach can be found here.
Acquiring the data
Download from here and unzip in the "data" folder
Running the model
To preprocess the data, run reprocess_syntax.py on each data file. See the
file's docstring for more details on getting google SyntaxNet set up. Then run
translate_frequency.py to generate external word-frequency features.
The model can then be trained to produce predictions on the dev set using
lightgbm_dev.py or on the test set using lightgbm_script.py. The language
trained on (en_es, fr_en, es_en, or all) and the number of user trained
on can be controlled using the --lang and --users flags.
Models trained on each individual language can be averaged with a model trained
on all languages using the average_models.py script.
Testing model lesions
To test the effects of removing different feature sets, first run
preprocess_to_pickle.py to create a pickled version of the data and cut down
on preprocessing time across different lesions. Then run run_lesion.py, using
the --lesion flag to choose the lesion experiment to conduct. See code or
paper for list of options.
The results of the lesions can be plotted using graph_lesions.r (in R, not python).
Related Skills
node-connect
347.6kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
108.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.6kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.6kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
