VMLU
No description available
Install / Use
/learn @ZaloAI-Jaist/VMLUREADME
VMLU is a human-centric benchmark suite specifically designed to evaluate the general capabilities of foundational models with a focus on the Vietnamese language. This benchmark covers 58 subjects spanning four categories: STEM, Humanities, Social Sciences, and more. It ranges in difficulty from an elementary level to an advanced professional level, and tests both general knowledge and problem-solving ability. Please visit our website for more details.
We hope VMLU could help developers track the progress and analyze the important strengths/shortcomings of their models.
Table of Contents
Data
Download
Download the zip file: Please visit our website:
JSONL format
To facilitate usage, we have organized the subject name handlers and English/Vietnamese names corresponding to 58 subjects. Questions extracted from datasets are presented in both LaTeX and non-LaTeX formats.
Dataset structure
VMLU dataset covers 58 subjects including 10,880 multiple-choice questions and answers in Vietnamese language.
| Id | Subject | Category | Number of questions | |-----:|:--------------------------------------------------------|:---------------|----------------------:| | 01 | Elementary Mathematics | STEM | 200 | | 02 | Elementary Science | STEM | 200 | | 03 | Middle School Biology | STEM | 188 | | 04 | Middle School Chemistry | STEM | 200 | | 05 | Middle School Mathematics | STEM | 119 | | 06 | Middle School Physics | STEM | 200 | | 07 | High School Biology | STEM | 200 | | 08 | High School Chemistry | STEM | 200 | | 09 | High School Mathematics | STEM | 163 | | 10 | High School Physics | STEM | 200 | | 11 | Applied Informatics | STEM | 200 | | 12 | Computer Architecture | STEM | 200 | | 13 | Computer Network | STEM | 197 | | 14 | Discrete Mathematics | STEM | 182 | | 15 | Electrical Engineering | STEM | 194 | | 16 | Introduction to Chemistry | STEM | 197 | | 17 | Introduction to Physics | STEM | 191 | | 18 | Introduction to Programming | STEM | 197 | | 19 | Metrology Engineer | STEM | 155 | | 20 | Operating System | STEM | 200 | | 21 | Statistics and Probability | STEM | 192 | | 22 | Middle School Civil Education | Social Science | 196 | | 23 | Middle School Geography | Social Science | 162 | | 24 | High School Civil Education | Social Science | 200 | | 25 | High School Geography | Social Science | 179 | | 26 | Business Administration | Social Science | 192 | | 27 | Ho Chi Minh Ideology | Social Science | 197 | | 28 | Macroeconomics | Social Science | 200 | | 29 | Microeconomics | Social Science | 200 | | 30 | Principles of Marxism and Leninism | Social Science | 200 | | 31 | Sociology | Social Science | 196 | | 32 | Elementary History | Humanity | 195 | | 33 | Middle School History | Humanity | 200 | | 34 | Middle School Literature | Humanity | 192 | | 35 | High School History | Humanity | 200 | | 36 | High School Literature | Humanity | 200 | | 37 | Administrative Law | Humanity | 100 | | 38 | Business Law | Humanity | 197 | | 39 | Civil Law | Humanity | 200 | | 40 | Criminal Law | Humanity | 180 | | 41 | Economic Law | Humanity | 178 | | 42 | Education Law | Humanity | 183 | | 43 | History of World Civilization | Humanity | 200 | | 44 | Idealogical and Moral Cultivation | Humanity | 200 | | 45 | Introduction to Laws | Humanity | 139 | | 46 | Introduction to Vietnam Culture | Humanity | 200 | | 47 | Logic | Humanity | 192 | | 48 | Revolutionary Policy of the Vietnamese Commununist Part | Humanity | 200 | | 49 | Vietnamese Language and Literature | Humanity | 192 | | 50 | Accountant | Other | 186 | | 51 | Clinical Pharmacology | Other | 200 | | 52 | Environmental Engineering | Other | 189 | | 53 | Internal Basic Medicine | Other | 189 | | 54 | Preschool Pedagogy | Other | 112 | | 55 | Tax Accountant | Other | 192 | | 56 | Tax Civil Servant | Other | 189 | | 57 | Civil Servant | Other | 189 | | 58 | Driving License Certificate | Other | 189 |
Below is a non-LaTeX example from dev.jsonl:
{
"id": "51-0001",
"question": "Các phát biểu ĐÚNG về ĐỊNH NGHĨA Dược lâm sàng, NGOẠI TRỪ:",
"choices": [
'A. Là ngành khoa học về sử dụng thuốc hợp lý',
'B. Nghiên cứu phát triển kinh tế dược bệnh viện',
'C. Giúp tối ưu hóa việc sử dụng thuốc trên cơ sở về dược và y sinh học',
'D. Đối tượng chính của môn học dược lâm sàng là thuốc và người bệnh'
],
"answer": "B"
}
Below is a LaTeX example from dev.jsonl:
{
"id": "58-0006",
"question": "Trong không gian Oxyz, cho đường thẳng d:{\\frac{x-2}{1}}=\\frac{y-1}{-2}=\\frac{z+1}{3}. Điểm nào dưới đây thuộc d ?",
"choices": [
"A. Q\\left(2;{1};{1}\\right)",
"B. M\\left(1;{2};{3}\\right)",
"C. N\\left(1;{-}2;{3}\\right)",
"D. P\\left(2;{1};{-}1\\right)"
],
"answer": "D"
}
Leaderboard
Below are zero-shot and five-shot accuracies from the models that we evaluate in the initial release, please visit our official website.
<span style="color:red">DISCLAIMER: Please note that evaluating models like LLMs can be challenging, as leaderboards might be susceptible to manipulation, and a small tweak in prompting can lead to totally different results. It's especially concerning because some models are not publicly accessible. For instance, good results can be achieved through distilling answers from stronger models like GPT-4 or even from humans. Therefore, it's important to approach leaderboard scores with caution. Most of the models assessed here are public With Open Access, which have public weights or APIs for verification.</span>
<div align="center"><b>From-Scratch Models Leaderboard</b></div>
Zero-shot
| # | Model | Organization | Base Model | Accessibility | Evaluation Date | STEM | Social Science | Humanities | Other | Average | | -- | ------------------- | :---------: | :--------: | :-----------: | :-----------: | :--: | :--------
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
