74 skills found · Page 1 of 3
microsoft / PhiCookBookThis is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks
LinXueyuanStdio / LaTeX OCR PRO:art: 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reasoning (based on LaTeX AST).
01-ai / Yi 1.5Yi-1.5 is an upgraded version of Yi, delivering stronger performance in coding, math, reasoning, and instruction-following capability.
InternLM / InternLM MathState-of-the-art bilingual open-sourced Math reasoning LLMs.
zwq2018 / Math Reasoning With PLMsEMNLP22: Multi-View Reasoning: Consistent Contrastive Learning for Math Word Problem
mathllm / MATH V[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.
HZQ950419 / Math LLaVACode for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
allenai / LilaA unified benchmark for math reasoning
facebookresearch / IGSMThe code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2.2, How to Learn From Mistakes on Grade-School Math Problems" (arxiv 2408.16293)
eth-nlped / Mathdial🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023
NVlabs / NFTImplementation of Negative-aware Finetuning (NFT) algorithm for "Bridging Supervised Learning and Reinforcement Learning in Math Reasoning"
cyzhh / MMOSMix of Minimal Optimal Sets (MMOS) of dataset has two advantages for two aspects, higher performance and lower construction costs on math reasoning.
tahreemrasul / Math App LangchainA math application/chatbot to answer a user's arithmetic or reasoning based questions. Utilizes LangChain agents and tools. Frontend with Chainlit.
google-research-datasets / GSM ICGrade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evaluate the distractibility of language models.
qtli / GSM PlusGSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
tongjingqi / MathTrapIn this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8K.
YutingLi0606 / Vision Matters(ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning
lupantech / IneqmathSolving Inequality Proofs with Large Language Models.
HKU-MMLab / Math VR CodePlot CoTMath-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
ECNU-ICALK / EduChat Math[MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models