LoHan
A low-cost, high-performance deep learning training framework that enables efficient 100B-scale model fine-tuning on a commodity server with a consumer- grade GPU and limited main memory capacity [ICDE 25]
Install / Use
/learn @RC4ML/LoHanREADME
LoHan
LoHan is a <ins>Lo</ins>w-cost <ins>H</ins>igh-perform<ins>an</ins>ce framework for large model fine-tuning. This repository now includes efficient data-parallel fine-tuning code (Ratel, ICDE 2025) and more exciting features are coming soon!
Ratel ICDE 2025 Artifact
This artifact provides a guide to replicate the primary experiments in this paper. You can follow this repository to reproduce the experimental results about Ratel's maximum trainable model sizes, batch sizes and throughput in our paper. The documentation and auto-run script mainly focus on reproducing results in Subsection V-B and you can adjust the code to reproduce results in other sections.
Environment Setup
SSD Configuration
Ratel aggregates the I/O bandwidth of multiple SSDs by configuring a RAID array for efficient model states and activation offloading. Therefore, we provide a script to configure this array.
First, modify the make_raid.sh to meet your own needs. The script in this repo is used to configure the drives /dev/nvme0n1 to /dev/nvme11n1 into an array. You can adjust the line 23 to change the drives you want to set up.
After configuring the script, you can run the script to set up the RAID array. You might need a root permission to do so:
./make_raid.sh
Installing the Python packages
conda create -n ratel python=3.10
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
# If there are different CUDA versions, you should specify the CUDA version
# export CUDA_HOME=/usr/local/cuda-11.8
pip install flash-attn==1.0.4
# The following two packages are to fulfill the requirements of existing packages
pip install six==1.16.0
pip install scikit-learn
Running Ratel
We provide a script to run Ratel. You can adjust the script to reproduce the results.
bash run.sh
Limiting the Memory Size
Experiments in Subsection V-B require adjusting the main memory capacity. Instead of manually adding and removing the machine's DRAM, you can consider pinning the main memory via huge page so that these memory spaces cannot be utilized by Ratel.
You can use the following script (root permission required) to pin the main memory
sh -c "echo 1024 > /proc/sys/vm/nr_hugepages"
The 1024 means you set the HugePages_num to 1024. Each HugePageSize is 2MB. Therefore the total memory size pinned by huge page in this script is 2MB * 1024 = 2048MB.
You can check the pinned memory by using the following command.
$ cat /proc/meminfo | grep Huge
For example, the following output indicates the memory pinned by huge page is 1024(HugePages_Total)*2048kB(Hugepagesize)=2GB.
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 1024
HugePages_Free: 1024
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 2097152 kB
Benchmark Results
Please refer to here for our raw evaluation data in our paper that might help for your reproduing.
Acknowledgement
Some of the code in this project is modified from the DeepSpeed repository, we appreciate the contributions of the original repository authors.
- op_ds/accelerator/real_accelerator.py
- op_ds/ops/op_builder/all_ops.py
- op_ds/ops/op_builder/builder.py
- op_ds/ops/CPUAdam.py
- nvme_ds
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
13.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
