ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations

This repository is the official implementation of our paper "ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations" by Ekaterina Grishina, Mikhail Gorbunov and Maxim Rakhuba.

OPT and Llama2 HuggingFace models are supported.

Installation

Clone and navigate to the repository

git clone https://github.com/GrishKate/ProcrustesGPT.git

Install requirements.txt

pip install -r requirements.txt

How To Use

Fill the configs for compression of the weight matrices. For examples of configs, please, see /configs folder. Provide tmp_path folder to save orthogonal matrices.

Firstly, compress the model in Frobenius norm:

python run_procrustes_gpt.py --model_name 'facebook/opt-125m'\ # 'facebook/opt-...' and 'meta-llama/Llama-2-...-hf' are supported 
                             --model_path '/path/to/model' \ # optionally if model is stored locally
                             --cfg_for_compression_path './configs/compression_frobenius.yaml' \ # path to config
                             --cfg_for_layers_path './configs/k_layers_opt_125m.yaml' # path to config with specified sizes of decompositions
                             --skip_connections 'cayley' \ # optionally compress skip connections ('cayley' or 'exponent')
                             --save True \ # save the resulting model
                             --save_path 'path/to/save/model' \ # where to save
                             --filename 'opt_125m_compressed.pt' \ # filename to save

Secondly, change the compression config and run compression in the weighted norm:

python run_procrustes_gpt.py --model_name 'facebook/opt-125m'\
                             --model_path '/path/to/model' \ # optionally if model is stored locally
                             --cfg_for_compression_path './configs/compression_weighted.yaml' \
                             --cfg_for_layers_path './configs/k_layers_opt_125m.yaml'
                             --skip_connections 'cayley' \ # compress skip connections ('cayley' or 'exponent')
                             --save True \ # save the resulting model or not
                             --save_path 'path/to/save/model' \ # where to save
                             --filename 'opt_125m_compressed.pt' \ # filename to save

To evaluate the perplexity:

python run_lm_eval.py --model 'facebook/opt-125m' \
                      --tokenizer_path 'facebook/opt-125m' \ # optionally provide path to tokenizer, if saved locally
                      --weights_path 'path/to/save/model/opt_125m_compressed.pt'\ # path to saved compressed model
                      --no-wandb

To evaluate the zero-shot performance:

python run_ppl_eval.py --model_name 'facebook/opt-125m'\
                       --tokenizer_path 'facebook/opt-125m' \ # optionally provide 
                       --weights_path '/kaggle/working/opt_125m_compressed.pt' # path to saved compressed model

Credits

This code is based on SliceGPT repository.

ProcrustesGPT

Install / Use

README

ProcrustesGPT: Compressing LLMs with Structured Matrices and Orthogonal Transformations

Installation

How To Use

Credits