SkillAgentSearch skills...

FERMAT

A vLLM-based Pipeline for benchmarking various VLMs on HMER Dataset of AI4Bharat

Install / Use

/learn @AI4Bharat/FERMAT
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

FERMAT: Can Vision-Language Models Evaluate Handwritten Math?

📜 Paper | 🤗 HF Dataset

We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content. Please refer to our paper for more details.

<p align="center" width="100%"> <img src="FERMAT.png" alt="We present FERMAT, a benchmark designed to assess VLMs’ ability to detect, localize and correct errors in handwritten mathematical content." style="width: 75%; min-width: 200px; display: block; margin: auto;"> </p>

Loading Data

Steps to download data and store the images in benchmark_images, and csv in benchmark_csv. Steps to dowload data for the oikantik format

Setup

To run evaluation of VLMs against the FEMRAT dataset, you need to install the required packages by running the following command:

pip install -r requirements.txt

We self-hosted Pixtral-12B-2409 (https://huggingface.co/mistralai/Pixtral-12B-2409), Pixtral-Large-Instruct-2411, LLaMa-3.2-11B-Vision-Instruct, LLaMa-3.2-90B-Vision-Instruct, Phi-3.5-Vision-Instruct using VLLM (https://github.com/vllm-project/vllm)

We used hosted services for GPT-Family, Gemini-Family

For self-hosted models,

  1. Set up environment variables:

    export OPENAI_API_BASE=[ADD_THE_ENDPOINT_URL_OF_HOSTED_MODEL]
    

    Example: "http://localhost:8004/v1"

  2. Start Evaluations:

    python main.py --model [MODEL_NAME] --dir_name [DATA_DIR]
    
    • MODEL_NAME: Name of the model to be evaluated. Choices: ['pixtral', 'pixtral_large', 'phi', 'llama_large', 'llama']
    • DATA_DIR: Path to the directory where the Benchmark Images are stored
  3. Fill-in CSV

    Once the evaluation is done, the results will be stored in a JSON File with the format state_<MODEL_NAME>.json. You can convert this JSON file to a CSV file using the following command:

    python fill_in_csv.py --model [MODEL_NAME] --csv-file [CSV_FILE] --json-file [JSON_FILE]
    
    • MODEL_NAME: Name of the model to be evaluated. Choices: ['pixtral', 'pixtral_large', 'phi', 'llama_large', 'llama']
    • CSV_FILE: Path to the CSV file where the results need to be filled in.
    • JSON_FILE: Path to the JSON file where the results are stored.

Citation

If you used this repository or our models, please cite our work:

@article{nath2025vision1language,
  title   = {Can Vision-Language Models Evaluate Handwritten Math?},
  author  = {Oikantik Nath and Hanani Bathina and Mohammed Safi Ur Rahman Khan and Mitesh M. Khapra},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2501.07244}
}

Related Skills

View on GitHub
GitHub Stars5
CategoryDevelopment
Updated2mo ago
Forks1

Languages

Python

Security Score

85/100

Audited on Jan 20, 2026

No findings