HiFT
memory-efficient fine-tuning; support 24G GPU memory fine-tuning 7B
Install / Use
/learn @misonsky/HiFTREADME
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy
This repo contains the source code of the Python package HiFT and several examples of how to integrate it with PyTorch models, such as those in Hugging Face. We only support PyTorch for now. See our paper for a detailed description of ·HiFT. HiFT supports FPFT of 7B models for 24G GPU memory devices under mixed precision without using any memory saving techniques and various optimizers including AdamW, AdaGrad, SGD, etc.
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy <br> Yongkang Liu, Yiqun Zhang, Qian Li, Tong Liu, Shi Feng, Daling Wang, Yifei Zhang, Hinrich Schütze <br> Paper: https://arxiv.org/abs/2401.15207 <br>
News
26/1/2024: Publish the first version of HiFT manuscript
25/2/2024: Publish the second version of HiFT manuscript and source code
1/5/2024: Updated HiFT support for LoRA
10/5/2024: Adapt the optimizer provided by bitsandbytes
13/5/2024*: Adapt Adalora,LoRA, IA3, P_tuning, Prefix_tuning , Prompt_tuning peft method.
Repository Overview
There are several directories in this repo:
- hift/ contains the source code for the package
hift, which needs to be installed to run the examples we provide; - examples Contains
HiFT-basedNER,QA,classification,text generation,instruction fine-tuning, andpre-trainingexample implementation. - scripts contains the script for running examples we provide.
- dsconfig contains configuration files required for mixed precision.
- data contains examples for instruction fine-tuning and pre-training.
Out-of-memory issues
Instruction fine-tuning 7B model on A6000 (48G), and the experimental results show that the maximum sequence length supported by HiFT is 2800. Beyond this limit, OOM issues may occur.
| Model | Max Seq Length | Max Batch Size | | ----------------- | -------------- | -------------- | | llama2-7b(Alpaca) | 512 | 8 | | llama2-7b(Vicuna) | 2800 | 1 |
Instruction fine-tuning 7B model on RTX3090 (24G) . If you use multiple GPUs for distributed training on RTX 3090/4000, add the following commands before running: export NCCL_IB_DISABLE=1; export NCCL_P2P_DISABLE=1
| Model | Max Seq Length | Max Batch Size | | ----------------- | -------------- | -------------- | | llama2-7b(Alpaca) | 512 | 3 | | llama2-7b(Vicuna) | 1400 | 1 |
Requirements
pytorch>= 2.1.1;transformers==4.36.2pip install -r requirements.txtconda install mpi4py==3.1.4pip install flash-attn==2.5.8
Quickstart
- Installing
hift
pip install hift
- Import
hiftpackage
### generation task
from hift import HiFTSeq2SeqTrainer,GetCallBack,peft_function,Seq2SeqTrainer
### classification taks
from hift import HiFTrainer,GetCallBack,PEFTrainer,peft_function
### QA task
from hift import HiFTQuestionAnsweringTrainer,GetCallBack,QuestionAnsweringTrainer,peft_function
- Add
HiFTconfiguration
@dataclass
class HiFTArguments(ModelArguments):
HiTaskType: str = field(
default="SEQ_CLS",
metadata={"help": ("HiTaskType should be consistent with PEFT TaskType" )},
)
peft_type: str = field(
default=None,
metadata={"help": ("peft_type should be in [lora,adalora,ia3,p_tuning,prefix_tuning,prompt_tuning]" )},
)
init_text:str = field(
default="Predict if sentiment of this review is positive, negative or neutral",
metadata={
"help": (
"the init prompt text for prompt tuning"
)
},
)
lora_rank: int = field(
default=8,
metadata={"help": ("rank for lora or adalora" )},
)
peft_path : Optional[str] = field(default=None)
virtual_tokens:int = field(
default=20,
metadata={"help": ("the number of virtual tokens for p_tuning, prefix_tuning and prefix_tuning" )},
)
group_element: int = field(
default=1,
metadata={"help": ("number element for each group parameters" )},
)
optimizer_strategy: str = field(
default="down2up",
metadata={"help": ("optimizer strategy of ['down2up','down2up','random']" )},
)
hier_tuning: bool = field(
default=False,
metadata={
"help": (
"hierarchical optimization for LLMS"
)
},
)
freeze_layers: List[str] = field(
default_factory=list,
metadata={
"help": (
"Index of the frozen layer"
)
},
)
HiTaskType should be consistent with PEFT TaskType.
sequence classification, multiple choice tasks:
TaskType.SEQ_CLSquestion answering task:
TaskType.QUESTION_ANSsequence labeling task:
TaskType.TOKEN_CLSgeneration task:
TaskType.CAUSAL_LM
group_element: the number of layers included in a block. Default value is 1.
freeze_layers: Layers you want to freeze during fine-tuning. You should provide the index of the corresponding layer. The index of the embedding layer is 0, the index of the first layer is 1,...
- Using
HiFTTrainer
HiFT inherits the trainer of huggingface, so you can directly use the trainer provided by hift to replace the original trainer.
- Classification Task
if model_args.hier_tuning:#hier_tuning
trainer = HiFTrainer(
hiFThandler = GetCallBack(model_args.model_name_or_path),
HiTaskType = model_args.HiTaskType,
group_element = model_args.group_element,
strategy = model_args.optimizer_strategy,
hier_tuning= model_args.hier_tuning,
peft_type = model_args.peft_type,
freeze_layers = model_args.freeze_layers,
args=training_args,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
model=model,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
data_collator=data_collator
)
else:
trainer = PEFTrainer(
peft_type = model_args.peft_type,
args=training_args,
model=model,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
compute_metrics=compute_metrics,
tokenizer=tokenizer,
data_collator=data_collator,
)
QA Task
if model_args.hier_tuning:
trainer = HiFTQuestionAnsweringTrainer(
hiFThandler = GetCallBack(model_args.model_name_or_path),
HiTaskType = model_args.HiTaskType,
group_element = model_args.group_element,
strategy = model_args.optimizer_strategy,
hier_tuning= model_args.hier_tuning,
peft_type = model_args.peft_type,
freeze_layers = model_args.freeze_layers,
eval_examples=eval_examples if training_args.do_eval else None,
post_process_function=post_processing_function,
args=training_args,
model=model,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics)
else:
trainer = QuestionAnsweringTrainer(
peft_type = model_args.peft_type,
eval_examples=eval_examples if training_args.do_eval else None,
post_process_function=post_processing_function,
args=training_args,
model=model,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics)
-
Generation Task
if model_args.hier_tuning:#hier_tuning trainer = HiFTSeq2SeqTrainer( hiFThandler = GetCallBack(model_args.model_name_or_path), HiTaskType = model_args.HiTaskType, group_element = model_args.group_element, strategy = model_args.optimizer_strategy, hier_tuning= model_args.hier_tuning, peft_type = model_args.peft_type, freeze_layers = model_args.freeze_layers, args=training_args, model=model, train_dataset=train_dataset if training_args.do_train else None, eval_dataset=eval_dataset if training_args.do_eval else None, compute_metrics=compute_metrics if training_args.predict_with_generate else None, tokenizer=tokenizer, data_collator=data_collator ) else: trainer = Seq2SeqTrainer( peft_type = model_args.peft_type, args=training_args, model=model, train_dataset=train_dataset if training_args.do_train else None, eval_dataset=eval_dataset if training_args.do_eval else None, tokenizer=tokenizer, data_collator=data_collator, compute_metrics=compute_metrics if training_args.predict_with_generate else None, )
Adapt Model to HiFT
HiFT supports any model. It is very easy to adapt to HiFT.
- Define the task types supported by your model in `T
