SkillAgentSearch skills...

MoCLE

MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)

Install / Use

/learn @gyhdog99/MoCLE
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

MoCLE

arXiv arXiv HF

This repository contains the implementation of the paper:

MoCLE: Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning <br> Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang† <br> *Equal contribution †Corresponding Author <br> Arxiv preprint, 2023

<!-- ![img](./images/overview.png) --> <img src="./images/overview.png" alt="drawing" width="800"/>

Installation

  1. Install LAVIS to the current directory, the primary codebase on which MoCLE is built.

    conda create -n lavis python=3.8
    conda activate lavis
    git clone https://github.com/salesforce/LAVIS.git
    cd LAVIS
    pip install -e .
    
  2. Clone the repository of MoCLE.

    git clone https://github.com/gyhdog99/mocle.git
    
  3. Build our modified PEFT package.

    cd mocle
    cd peft-main
    pip install -e .
    
  4. Copy mocle.py and mocle.yaml in this repository into the LAVIS directory following the architecture below:

    cd ../
    cp mocle.py ../lavis/models/blip2_models
    cp mocle.yaml ../lavis/configs/models/blip2
    
  5. Modify ../lavis/models/__init__.py in LAVIS as follows:

    • Add from lavis.models.blip2_models.mocle import MoCLE in the beginning of the file.
    • Add "MoCLE" to __all__ = [...,...].

Prepare Models

  1. MoCLE is based on Vicuna-7B-v1.1. Download the corresponding LLM checkpoint here.

  2. Set the llm_model argument in ../lavis/configs/mocle.yaml to the local path towards the downloaded Vicuna checkpoint.

  3. Download the pre-trained checkpoint of MoCLE.

    | # Clusters | Temperature | Main Model | Clustering Model | |:--:|:----:|:-----:|:-----:| | 16 | 0.05 | c16_t005 | c16 | | 64 | 0.05 | c64_t005 | c64 | | 64 | 0.10 | c64_t010 | c64 |

  4. Set finetuned and kmeans_ckpt in ../lavis/configs/mocle.yaml to the weights of the downloaded main model and clustering model, respectively. (Please adjust the total_tasks and gates_tmp parameters as # Clusters and Temperature accordingly).

Model Inference

  1. Load an image locally

    import torch
    from PIL import Image
    # setup device to use
    device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
    # load sample image
    raw_image = Image.open(".../path_to_images/").convert("RGB")
    
  2. Load the models

    from lavis.models import load_model_and_preprocess
    # loads MoCLE model
    model, vis_processors, _ = load_model_and_preprocess(name="mocle", model_type="mocle", is_eval=True, device=device)
    # prepare the image
    image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
    
  3. Generate

    response = model.generate({"image": image, "prompt": ["Your query about this image"]})
    print(response)
    

Model Training

Coming soon.

Acknowledgement

  • LAVIS: Implementations of our MoCLE are built upon LAVIS.
  • PEFT: Implementations of our Mixture of LoRA experts are based on PEFT.

Citation

If you're using MoCLE in your research or applications, please cite using this BibTeX:

@article{gou2023mixture,
  title={Mixture of cluster-conditional lora experts for vision-language instruction tuning},
  author={Gou, Yunhao and Liu, Zhili and Chen, Kai and Hong, Lanqing and Xu, Hang and Li, Aoxue and Yeung, Dit-Yan and Kwok, James T and Zhang, Yu},
  journal={arXiv preprint arXiv:2312.12379},
  year={2023}
}
View on GitHub
GitHub Stars46
CategoryDevelopment
Updated1mo ago
Forks2

Languages

Jupyter Notebook

Security Score

75/100

Audited on Mar 5, 2026

No findings