Cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Generate Convert Improve

Install / Use

/learn @cambrian-mllm/Cambrian

About this skill

Quality Score

0/100

README

🪼 Cambrian-1:<br> A Fully Open, Vision-Centric Exploration of Multimodal LLMs

<p> <img src="images/cambrian.png" alt="Cambrian" width="500" height="auto"> </p> <a href="https://arxiv.org/abs/2406.16860" target="_blank"> <img alt="arXiv" src="https://img.shields.io/badge/arXiv-Cambrian--1-red?logo=arxiv" height="25" /> </a> <a href="https://cambrian-mllm.github.io/" target="_blank"> <img alt="Website" src="https://img.shields.io/badge/🌎_Website-cambrian--mllm.github.io-blue.svg" height="25" /> </a> <br> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-1-models-666fa7116d5420e514b0f23c" target="_blank"> <img alt="HF Model: Cambrian-1" src="https://img.shields.io/badge/%F0%9F%A4%97%20_Model-Cambrian--1-ffc107?color=ffc107&logoColor=white" height="25" /> </a> <a href="https://huggingface.co/collections/nyu-visionx/cambrian-data-6667ce801e179b4fbe774e11" target="_blank"> <img alt="HF Dataset: Cambrian 10M" src="https://img.shields.io/badge/%F0%9F%A4%97%20_Data-Cambrian--10M-ffc107?color=ffc107&logoColor=white" height="25" /> </a> <a href="https://huggingface.co/datasets/nyu-visionx/CV-Bench" target="_blank"> <img alt="HF Dataset: CV-Bench" src="https://img.shields.io/badge/%F0%9F%A4%97%20_Benchmark-CV--Bench-ffc107?color=ffc107&logoColor=white" height="25" /> </a> <div style="font-family: charter;"> <a href="https://tsb0601.github.io/petertongsb/" target="_blank">Shengbang Tong*</a>, <a href="https://ellisbrown.github.io/" target="_blank">Ellis Brown*</a>, <a href="https://penghao-wu.github.io/" target="_blank">Penghao Wu*</a>, <br> <a href="https://sites.google.com/view/sanghyunwoo/" target="_blank">Sanghyun Woo</a>, <a href="https://www.linkedin.com/in/manoj-middepogu/" target="_blank">Manoj Middepogu</a>, <a href="https://www.linkedin.com/in/sai-charitha-akula-32574887/" target="_blank">Sai Charitha Akula</a>, <a href="https://jihanyang.github.io/" target="_blank">Jihan Yang</a>, <br> <a href="https://github.com/vealocia" target="_blank">Shusheng Yang</a>, <a href="https://adithyaiyer1999.github.io/" target="_blank">Adithya Iyer</a>, <a href="https://xichenpan.com/" target="_blank">Xichen Pan</a>, <a href="https://www.linkedin.com/in/ziteng-wang-694b8b227/" target="_blank">Austin Wang</a>, <br> <a href="http://cs.nyu.edu/~fergus" target="_blank">Rob Fergus</a>, <a href="http://yann.lecun.com/" target="_blank">Yann LeCun</a>, <a href="https://www.sainingxie.com/" target="_blank">Saining Xie</a> </div> </div> <br>

Fun fact: vision emerged in animals during the Cambrian period! This was the inspiration for the name of our project, Cambrian.

Release

[09/09/24] 🧪 We've released our MLLM evaluation suite with 26 benchmarks, supporting manual usage and parallelization using Slurm for HPC clusters. See the eval/ subfolder for more details.
[07/03/24] 🚂 We have released our targeted data engine! See the dataengine/ subfolder for more details.
[07/02/24] 🤗 CV-Bench is live on Huggingface! Please see here for more: https://huggingface.co/datasets/nyu-visionx/CV-Bench
[06/24/24] 🔥 We released Cambrian-1! We also release three sizes of model (8B, 13B and 34B), training data, TPU training scripts. We will release GPU training script and evaluation code very soon.

Installation
Cambrian Weights
Cambrian Instruction Tuning Data
Train
Evaluation
Demo

Installation

TPU Training

Currently, we support training on TPU using TorchXLA

Clone this repository and navigate to into the codebase

git clone https://github.com/cambrian-mllm/cambrian
cd cambrian

Install Packages

conda create -n cambrian python=3.10 -y
conda activate cambrian
pip install --upgrade pip  # enable PEP 660 support
pip install -e ".[tpu]"

Install TPU specific packages for training cases

pip install torch~=2.2.0 torch_xla[tpu]~=2.2.0 -f https://storage.googleapis.com/libtpu-releases/index.html

GPU Inference

Clone this repository and navigate to into the codebase

git clone https://github.com/cambrian-mllm/cambrian
cd cambrian

Install Packages

conda create -n cambrian python=3.10 -y
conda activate cambrian
pip install --upgrade pip  # enable PEP 660 support
pip install ".[gpu]"

Cambrian Weights

Here are our Cambrian checkpoints along with instructions on how to use the weights. Our models excel across various dimensions, at the 8B, 13B, and 34B parameter levels. They demonstrate competitive performance compared to closed-source proprietary models such as GPT-4V, Gemini-Pro, and Grok-1.4V on several benchmarks.

Model Performance Comparison

| Model | # Vis. Tok. | MMB | SQA-I | MathVistaM | ChartQA | MMVP | |-------------------------|-------------|------|-------|------------|---------|-------| | GPT-4V | UNK | 75.8 | - | 49.9 | 78.5 | 50.0 | | Gemini-1.0 Pro | UNK | 73.6 | - | 45.2 | - | - | | Gemini-1.5 Pro | UNK | - | - | 52.1 | 81.3 | - | | Grok-1.5 | UNK | - | - | 52.8 | 76.1 | - | | MM-1-8B | 144 | 72.3 | 72.6 | 35.9 | - | - | | MM-1-30B | 144 | 75.1 | 81.0 | 39.4 | - | - | | Base LLM: Phi-3-3.8B | | | | | | | | Cambrian-1-8B | 576 | 74.6| 79.2 | 48.4 | 66.8 | 40.0 | | Base LLM: LLaMA3-8B-Instruct | | | | | | | | Mini-Gemini-HD-8B | 2880 | 72.7 | 75.1 | 37.0 | 59.1 | 18.7 | | LLaVA-NeXT-8B | 2880 | 72.1 | 72.8 | 36.3 | 69.5 | 38.7 | | Cambrian-1-8B | 576 | 75.9 | 80.4 | 49.0 | 73.3 | 51.3 | | Base LLM: Vicuna1.5-13B | | | | | | | | Mini-Gemini-HD-13B | 2880 | 68.6 | 71.9 | 37.0 | 56.6 | 19.3 | | LLaVA-NeXT-13B | 2880 | 70.0 | 73.5 | 35.1 | 62.2 | 36.0 | | Cambrian-1-13B | 576 | 75.7 | 79.3 | 48.0 | 73.8 | 41.3 | | Base LLM: Hermes2-Yi-34B | | | | | | | | Mini-Gemini-HD-34B | 2880 | 80.6 | 77.7 | 43.4 | 67.6 | 37.3 | | LLaVA-NeXT-34B | 2880 | 79.3 | 81.8 | 46.5 | 68.7 | 47.3 | | Cambrian-1-34B | 576 | 81.4 | 85.6 | 53.2 | 75.6 | 52.7 |

For the full table, please refer to our Cambrian-1 paper.

Our models offer highly competitive performance while using a smaller fixed number of visual tokens.

Using Cambrian-1

To use the model weights, download them from Hugging Face:

We provide a sample model loading and generation script in inference.py.

Cambrian-10M Instruction Tuning Data

In this work, we collect a very large pool of instruction tuning data, Cambrian-10M, for us and future work to study data in training MLLMs. In our preliminary study, we filter the data down to a high quality set of 7M curated data points, which we call Cambrian-7M. Both of these datasets are available in the following Hugging Face Dataset: Cambrian-10M.

Data Collection

We collected a diverse range of visual instruction tuning data from various sources, including VQA, visual conversation, and embodied visual interaction. To ensure high-quality, reliable, and large-scale knowledge data, we designed an Internet Data Engine.

Additionally, we observed that VQA data tends to generate very short outputs, creating a distribution shift from the training data. To address this issue, we leveraged GPT-4v and GPT-4o to create extended responses and more creative data.

Data Engine for Knowledge Data

To resolve the inadequacy of science-related data, we designed an Internet Data Engine to collect reliable science-related VQA data. This engine can be applied to collect data on any topic. Using this engine, we collected an additional 161k science-related visual instruction tuning data points, increasing the total data in this domain by 400%! If you want to use this part of data, please use this jsonl.

GPT-4v Distilled Visual Instruction Tuning Data

We used GPT-4v to create an additional 77k data points. This data either uses GPT-4v to rewrite the original answer-only VQA into longer answers with more detailed responses or generates visual instruction tuning data based on the given image. If you want to use this part of data, please use this jsonl.

GPT-4o Distilled Creative Chat Data

We used GPT-4o to create an additional 60k creative data points. This data encourages the model to generate very long responses and often contains highly creative questions, such as writing a poem, composing a song, and more. If you want to use this part of data, please use this [jsonl](https://huggingface.co/datasets

Related Skills

openhue

346.4k

Control Philips Hue lights and scenes via the OpenHue CLI.

sag

346.4k

ElevenLabs text-to-speech with mac-style say UX.

weather

346.4k

Get current weather and forecasts via wttr.in or Open-Meteo

tweakcc

1.6k

Customize Claude Code's system prompts, create custom toolsets, input pattern highlighters, themes/thinking verbs/spinners, customize input box & user message styling, support AGENTS.md, unlock private/unreleased features, and much more. Supports both native/npm installs on all platforms.

cambrian-mllm

View profile

View on GitHub

GitHub Stars2.0k

CategoryCustomer

Updated6d ago

Forks137

cambrian-mllm/cambrian

Languages

Python

Security Score

100/100

Audited on Mar 27, 2026

No findings

Cambrian

Install / Use

README

🪼 Cambrian-1:<br> A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Release

Contents

Installation

TPU Training

GPU Inference

Cambrian Weights

Model Performance Comparison

Using Cambrian-1

Cambrian-10M Instruction Tuning Data

Data Collection

Data Engine for Knowledge Data

GPT-4v Distilled Visual Instruction Tuning Data

GPT-4o Distilled Creative Chat Data

Related Skills