HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy

This repo contains the source code of the Python package HiFT and several examples of how to integrate it with PyTorch models, such as those in Hugging Face. We only support PyTorch for now. See our paper for a detailed description of ·HiFT. HiFT supports FPFT of 7B models for 24G GPU memory devices under mixed precision without using any memory saving techniques and various optimizers including AdamW, AdaGrad, SGD, etc.

HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy <br> Yongkang Liu, Yiqun Zhang, Qian Li, Tong Liu, Shi Feng, Daling Wang, Yifei Zhang, Hinrich Schütze <br> Paper: https://arxiv.org/abs/2401.15207 <br>

News

26/1/2024: Publish the first version of HiFT manuscript
25/2/2024: Publish the second version of HiFT manuscript and source code
1/5/2024: Updated HiFT support for LoRA
10/5/2024: Adapt the optimizer provided by bitsandbytes
13/5/2024*: Adapt Adalora,LoRA, IA3, P_tuning, Prefix_tuning , Prompt_tuning peft method.

Repository Overview

There are several directories in this repo:

hift/ contains the source code for the package hift, which needs to be installed to run the examples we provide;
examples Contains HiFT-based NER, QA, classification, text generation,instruction fine-tuning, and pre-training example implementation.
scripts contains the script for running examples we provide.
dsconfig contains configuration files required for mixed precision.
data contains examples for instruction fine-tuning and pre-training.

Out-of-memory issues

Instruction fine-tuning 7B model on A6000 (48G), and the experimental results show that the maximum sequence length supported by HiFT is 2800. Beyond this limit, OOM issues may occur.

| Model | Max Seq Length | Max Batch Size | | ----------------- | -------------- | -------------- | | llama2-7b(Alpaca) | 512 | 8 | | llama2-7b(Vicuna) | 2800 | 1 |

Instruction fine-tuning 7B model on RTX3090 (24G) . If you use multiple GPUs for distributed training on RTX 3090/4000, add the following commands before running: export NCCL_IB_DISABLE=1; export NCCL_P2P_DISABLE=1

| Model | Max Seq Length | Max Batch Size | | ----------------- | -------------- | -------------- | | llama2-7b(Alpaca) | 512 | 3 | | llama2-7b(Vicuna) | 1400 | 1 |

Requirements

pytorch >= 2.1.1; transformers==4.36.2
pip install -r requirements.txt
conda install mpi4py==3.1.4
pip install flash-attn==2.5.8

Quickstart

Installing hift

pip install hift

Import hift package

### generation task  

from hift import HiFTSeq2SeqTrainer,GetCallBack,peft_function,Seq2SeqTrainer

### classification taks  

from hift import HiFTrainer,GetCallBack,PEFTrainer,peft_function


### QA task  

from hift import HiFTQuestionAnsweringTrainer,GetCallBack,QuestionAnsweringTrainer,peft_function

Add HiFT configuration

@dataclass
class HiFTArguments(ModelArguments):
    HiTaskType: str = field(
        default="SEQ_CLS",
        metadata={"help": ("HiTaskType should be consistent with PEFT TaskType" )},
    )
    peft_type: str = field(
        default=None,
        metadata={"help": ("peft_type should be in [lora,adalora,ia3,p_tuning,prefix_tuning,prompt_tuning]" )},
    )
    init_text:str = field(
        default="Predict if sentiment of this review is positive, negative or neutral",
        metadata={
            "help": (
                "the init prompt text for prompt tuning"
            )
        },
    )
    lora_rank: int = field(
        default=8,
        metadata={"help": ("rank for lora or adalora" )},
    )
    peft_path : Optional[str] = field(default=None)
    virtual_tokens:int = field(
        default=20,
        metadata={"help": ("the number of virtual tokens for p_tuning, prefix_tuning and prefix_tuning" )},
    )
    group_element: int = field(
        default=1,
        metadata={"help": ("number element for each group parameters" )},
    )
    optimizer_strategy: str = field(
        default="down2up",
        metadata={"help": ("optimizer strategy of ['down2up','down2up','random']" )},
    )
    hier_tuning: bool = field(
        default=False,
        metadata={
            "help": (
                "hierarchical optimization for LLMS"
            )
        },
    )
    freeze_layers: List[str] = field(
        default_factory=list,
        metadata={
            "help": (
                "Index of the frozen layer"
            )
        },
    )

HiTaskType should be consistent with PEFT TaskType.

sequence classification, multiple choice tasks: TaskType.SEQ_CLS

question answering task: TaskType.QUESTION_ANS

sequence labeling task: TaskType.TOKEN_CLS

generation task: TaskType.CAUSAL_LM

group_element: the number of layers included in a block. Default value is 1.

freeze_layers: Layers you want to freeze during fine-tuning. You should provide the index of the corresponding layer. The index of the embedding layer is 0, the index of the first layer is 1,...

Using HiFT Trainer

HiFT inherits the trainer of huggingface, so you can directly use the trainer provided by hift to replace the original trainer.

Classification Task


if model_args.hier_tuning:#hier_tuning
        trainer = HiFTrainer(
            hiFThandler = GetCallBack(model_args.model_name_or_path),
            HiTaskType = model_args.HiTaskType,
            group_element = model_args.group_element,
            strategy = model_args.optimizer_strategy,
            hier_tuning= model_args.hier_tuning,
            peft_type = model_args.peft_type,
            freeze_layers = model_args.freeze_layers,
            args=training_args,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            model=model,
            tokenizer=tokenizer,
            compute_metrics=compute_metrics,
            data_collator=data_collator
        )
  else:
        trainer = PEFTrainer(
            peft_type = model_args.peft_type,
            args=training_args,
            model=model,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            compute_metrics=compute_metrics,
            tokenizer=tokenizer,
            data_collator=data_collator,
        )

QA Task

if model_args.hier_tuning:
        trainer = HiFTQuestionAnsweringTrainer(
            hiFThandler = GetCallBack(model_args.model_name_or_path),
            HiTaskType = model_args.HiTaskType,
            group_element = model_args.group_element,
            strategy = model_args.optimizer_strategy,
            hier_tuning= model_args.hier_tuning,
            peft_type = model_args.peft_type,
            freeze_layers = model_args.freeze_layers,
            eval_examples=eval_examples if training_args.do_eval else None,
            post_process_function=post_processing_function,
            args=training_args,
            model=model,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            tokenizer=tokenizer,
            data_collator=data_collator,
            compute_metrics=compute_metrics)
 else:
        trainer = QuestionAnsweringTrainer(
            peft_type = model_args.peft_type,
            eval_examples=eval_examples if training_args.do_eval else None,
            post_process_function=post_processing_function,
            args=training_args,
            model=model,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            tokenizer=tokenizer,
            data_collator=data_collator,
            compute_metrics=compute_metrics)

Generation Task

if model_args.hier_tuning:#hier_tuning
        trainer = HiFTSeq2SeqTrainer(
            hiFThandler = GetCallBack(model_args.model_name_or_path),
            HiTaskType = model_args.HiTaskType,
            group_element = model_args.group_element,
            strategy = model_args.optimizer_strategy,
            hier_tuning= model_args.hier_tuning,
            peft_type = model_args.peft_type,
            freeze_layers = model_args.freeze_layers,
            args=training_args,
            model=model,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            compute_metrics=compute_metrics if training_args.predict_with_generate else None,
            tokenizer=tokenizer,
            data_collator=data_collator
        )
 else:
        trainer = Seq2SeqTrainer(
            peft_type = model_args.peft_type,
            args=training_args,
            model=model,
            train_dataset=train_dataset if training_args.do_train else None,
            eval_dataset=eval_dataset if training_args.do_eval else None,
            tokenizer=tokenizer,
            data_collator=data_collator,
            compute_metrics=compute_metrics if training_args.predict_with_generate else None,
        )

Adapt Model to HiFT

HiFT supports any model. It is very easy to adapt to HiFT.

Define the task types supported by your model in `T

HiFT

Install / Use

README