Mleb

The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).

Generate Convert Improve

Install / Use

/learn @isaacus-dev/Mleb

About this skill

Quality Score

0/100

README

Massive Legal Embedding Benchmark (MLEB)

The Massive Legal Embedding Benchmark (MLEB) by Isaacus is the largest, most diverse, and most comprehensive benchmark for legal text embedding models. It contains 10 datasets spanning multiple document types, jurisdictions, areas of law, and tasks. To do well on MLEB, embedding models must demonstrate both extensive legal domain knowledge and strong legal reasoning skills.

This repository contains the code used to evaluate embedding models on MLEB (available in the scripts directory), as well as the full results of evaluated models (available in the results directory).

If you're looking for MLEB itself, you can find it here. You can also read our paper here.

Setup

We recommend setting up a virtual environment for this project and installing necessary dependencies using uv like so:

git clone https://github.com/isaacus-dev/mleb.git
cd mleb
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv --python 3.12
source .venv/bin/activate
uv sync

That will download this repository, create a Python 3.12 virtual environment using uv, activate the virtual environment, and install all necessary dependencies.

Alternatively, you may manually install the necessary dependencies listed in our pyproject.toml file.

After installing the necessary dependencies, we recommend creating a .env file in the root directory of this repository to store your API keys for various embedding model providers. You can use the provided .env.example file as a template:

# Isaacus
ISAACUS_API_KEY=...

# OpenAI
OPENAI_API_KEY=...

# Google
GOOGLE_API_KEY=...

# Voyage AI
VOYAGE_API_KEY=...

Make sure to replace the ... with your actual API keys. You may omit any keys for providers you won't be using.

Usage

To evaluate embedding models on MLEB, you can simply run the scripts/mleb.py script, like so:

python scripts/mleb.py

Inside the script, you can specify which specific models you want to evaluate by modifying the MODEL_IDS list near the top of the file. Model IDs correspond to ids of models defined in the MODEL_CONFIGS list in the scripts/models.py file.

New models may be added by adding new MLEBEvaluationModelConfig instances (defined in scripts/structs.py) to the MODEL_CONFIGS list.

Results are written in the mteb format to the results directory.

scripts/export.py may be run to pack all results into a single JSONL file available at results/results.jsonl. That file is used to dynamically present the latest benchmark results on the MLEB website.

License

This project is licensed under the MIT License.

Citation

@misc{butler2025massivelegalembeddingbenchmark,
      title={The Massive Legal Embedding Benchmark (MLEB)}, 
      author={Umar Butler and Abdur-Rahman Butler and Adrian Lucas Malec},
      year={2025},
      eprint={2510.19365},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19365}, 
}

Related Skills

OpenMetadata

9.9k

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.

korean-law-mcp

1.3k

국가법령정보MCP | 법제처 39개 API → 14개 MCP 도구. 법령·판례·조례·조약을 AI로 검색·조회·분석 | 39 Korean legal APIs → 14 MCP tools

Quality Assurance

Validates story implementations through testing, code review, and quality gate assessment. Uses testing skill for execution, checks standards compliance, and creates quality gate decisions (PASS/CONCERNS/FAIL/WAIVED).

rust-mcp-core

A config-driven MCP server core built on the official Rust SDK. Define tools, auth, prompts, resources, and HTTP behavior in YAML or JSON configuration -- the library handles execution, validation, and protocol compliance with minimal Rust code.

isaacus-dev

View profile

View on GitHub

GitHub Stars35

CategoryLegal

Updated11d ago

Forks4

isaacus-dev/mleb

Languages

Python

Security Score

95/100

Audited on Mar 27, 2026

No findings