SkillAgentSearch skills...

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Install / Use

/learn @zai-org/CogVLM2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

CogVLM2 & CogVLM2-Video

中文版README

<div align="center"> <img src=resources/logo.svg width="40%"/> </div> <p align="center"> 👋 Join our <a href="resources/WECHAT.md" target="_blank">Wechat</a> · 💡Try CogVLM2 <a href="http://cogvlm2-online.cogviewai.cn:7861/" target="_blank">Online</a> 💡Try CogVLM2-Video <a href="http://cogvlm2-online.cogviewai.cn:7868/" target="_blank">Online</a> </p> <p align="center"> 📍Experience the larger-scale CogVLM model on the <a href="https://open.bigmodel.cn/?utm_campaign=open&_channel_track_key=OWTVNma9">ZhipuAI Open Platform</a>. </p>

Recent updates

  • 🔥 News: 2024/8/30: The CogVLM2 paper has been published on arXiv.
  • 🔥 News: 2024/7/12: We have released CogVLM2-Video online web demo, welcome to experience it.
  • 🔥 News: 2024/7/8: We released the video understanding version of the CogVLM2 model, the CogVLM2-Video model. By extracting keyframes, it can interpret continuous images. The model can support videos of up to 1 minute. See more in our blog.
  • 🔥 News: 2024/6/8:We release CogVLM2 TGI Weight, which is a model can be inferred in TGI. See Inference Code in here
  • 🔥 News: 2024/6/5:We release GLM-4V-9B, which use the same data and training recipes as CogVLM2 but with GLM-9B as the language backbone. We removed visual experts to reduce the model size to 13B. More details at GLM-4 repo.
  • 🔥 News: 2024/5/24: We have released the Int4 version model, which requires only 16GB of video memory for inference. You can also run on-the-fly int4 version by passing --quant 4.
  • 🔥 News: 2024/5/20: We released the next generation model CogVLM2, which is based on llama3-8b and is equivalent (or better) to GPT-4V in most cases ! Welcome to download!

Model introduction

We launch a new generation of CogVLM2 series of models and open source two models based on Meta-Llama-3-8B-Instruct. Compared with the previous generation of CogVLM open source models, the CogVLM2 series of open source models have the following improvements:

  1. Significant improvements in many benchmarks such as TextVQA, DocVQA.
  2. Support 8K content length.
  3. Support image resolution up to 1344 * 1344.
  4. Provide an open source model version that supports both Chinese and English.

You can see the details of the CogVLM2 family of open source models in the table below:

| Model Name | cogvlm2-llama3-chat-19B | cogvlm2-llama3-chinese-chat-19B | cogvlm2-video-llama3-chat | cogvlm2-video-llama3-base |
|------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------| | Base Model | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct | Meta-Llama-3-8B-Instruct | | Language | English | Chinese, English | English | English | | Task | Image Understanding, Multi-turn Dialogue Model | Image Understanding, Multi-turn Dialogue Model | Video Understanding, Single-turn Dialogue Model | Video Understanding, Base Model, No Dialogue | | Model Link | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model | 🤗 Huggingface 🤖 ModelScope | 🤗 Huggingface 🤖 ModelScope | | Experience Link | 📙 Official Page | 📙 Official Page 🤖 ModelScope | 📙 Official Page 🤖 ModelScope | / | | Int4 Model | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model | 🤗 Huggingface 🤖 ModelScope 💫 Wise Model | / | / | | Text Length | 8K | 8K

View on GitHub
GitHub Stars2.4k
CategoryDevelopment
Updated8h ago
Forks163

Languages

Python

Security Score

100/100

Audited on Mar 30, 2026

No findings