LoHan

A low-cost, high-performance deep learning training framework that enables efficient 100B-scale model fine-tuning on a commodity server with a consumer- grade GPU and limited main memory capacity [ICDE 25]

Generate Convert Improve

Install / Use

/learn @RC4ML/LoHan

About this skill

Quality Score

0/100

README

LoHan

LoHan is a <ins>Lo</ins>w-cost <ins>H</ins>igh-perform<ins>an</ins>ce framework for large model fine-tuning. This repository now includes efficient data-parallel fine-tuning code (Ratel, ICDE 2025) and more exciting features are coming soon!

Ratel ICDE 2025 Artifact

This artifact provides a guide to replicate the primary experiments in this paper. You can follow this repository to reproduce the experimental results about Ratel's maximum trainable model sizes, batch sizes and throughput in our paper. The documentation and auto-run script mainly focus on reproducing results in Subsection V-B and you can adjust the code to reproduce results in other sections.

Environment Setup

SSD Configuration

Ratel aggregates the I/O bandwidth of multiple SSDs by configuring a RAID array for efficient model states and activation offloading. Therefore, we provide a script to configure this array.

First, modify the make_raid.sh to meet your own needs. The script in this repo is used to configure the drives /dev/nvme0n1 to /dev/nvme11n1 into an array. You can adjust the line 23 to change the drives you want to set up.

After configuring the script, you can run the script to set up the RAID array. You might need a root permission to do so:

./make_raid.sh

Installing the Python packages

conda create -n ratel python=3.10
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

# If there are different CUDA versions, you should specify the CUDA version
# export CUDA_HOME=/usr/local/cuda-11.8
pip install flash-attn==1.0.4

# The following two packages are to fulfill the requirements of existing packages
pip install six==1.16.0
pip install scikit-learn

Running Ratel

We provide a script to run Ratel. You can adjust the script to reproduce the results.

bash run.sh

Limiting the Memory Size

Experiments in Subsection V-B require adjusting the main memory capacity. Instead of manually adding and removing the machine's DRAM, you can consider pinning the main memory via huge page so that these memory spaces cannot be utilized by Ratel.

You can use the following script (root permission required) to pin the main memory

sh -c "echo 1024 > /proc/sys/vm/nr_hugepages"

The 1024 means you set the HugePages_num to 1024. Each HugePageSize is 2MB. Therefore the total memory size pinned by huge page in this script is 2MB * 1024 = 2048MB.

You can check the pinned memory by using the following command.

$ cat /proc/meminfo | grep Huge

For example, the following output indicates the memory pinned by huge page is 1024(HugePages_Total)*2048kB(Hugepagesize)=2GB.

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    1024
HugePages_Free:     1024
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         2097152 kB

Benchmark Results

Please refer to here for our raw evaluation data in our paper that might help for your reproduing.

Acknowledgement

Some of the code in this project is modified from the DeepSpeed repository, we appreciate the contributions of the original repository authors.

op_ds/accelerator/real_accelerator.py
op_ds/ops/op_builder/all_ops.py
op_ds/ops/op_builder/builder.py
op_ds/ops/CPUAdam.py
nvme_ds

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

13.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

RC4ML

View profile

View on GitHub

GitHub Stars24

CategoryEducation

Updated16d ago

Forks3

RC4ML/LoHan

Languages

Python

Security Score

90/100

Audited on Mar 12, 2026

No findings