InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Install / Use
/learn @OpenGVLab/InternVLREADME
InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites —— A Pioneering Open-Source Alternative to GPT-5
<div align="center"> <img width="500" alt="image" src="https://github.com/user-attachments/assets/930e6814-8a9f-43e1-a284-118a5732daa4"> <br> </div>[🆕 Blog] [🤔 FAQs] [🗨️ Chat Demo] [📖 Document] [🌐 API] [🚀 Quick Start]
[🔥 InternVL3.5 Report] [📜 InternVL3.0 Report] [📜 InternVL2.5 MPO] [📜 InternVL2.5 Report]
[📜 Mini-InternVL Paper] [📜 InternVL2 Blog] [📜 InternVL 1.5 Paper] [📜 InternVL 1.0 Paper]
[📖 2.0 中文解读] [📖 1.5 中文解读] [📖 1.0 中文解读]
Switch to the Chinese version (切换至中文版)
<a href="https://trendshift.io/repositories/9803" target="_blank"><img src="https://trendshift.io/api/badge/repositories/9803" alt="OpenGVLab%2FInternVL | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <img height="55" alt="image" src="https://github.com/user-attachments/assets/bd62ab46-f0ea-40c6-ab10-7fde671716cc">

News 🚀🚀🚀
2025/08/30: 🔥 We open-source the training code of InternVL3_5-GPT-OSS-20B-A4B and CascadeRL, which consists of a offline RL stage and a online RL stage. The training data for these two stages (MMPR-v1.2 and MMPR-Tiny) are also open-sourced.2025/08/26: 🚀 We introduce InternVL3.5, a new family of open-source multimodal models that significantly advances versatility, reasoning capability, and inference efficiency along the InternVL series. Our largest model, i.e., InternVL3.5-241B-A28B, attains state-of-the-art results among open-source MLLMs across general multimodal, reasoning, text, and agentic tasks. We also provide a 20B-A4B version (i.e., InternVL3_5-GPT-OSS-20B-A4B), which is built up on GPT-OSS-20B-A4B. Notably, we provide two model formats: the GitHub format, consistent with prior releases, and the HF format, aligned with the officialtransformersstandard.2025/04/17: We open-source the data construction pipeline and training scripts of MPO and VisualPRM. Additionally, the data construction scripts for MPO and VisualPRM are also released for reference.2025/04/11: We introduce InternVL3, an advanced multimodal large language model (MLLM) series that demonstrates superior overall performance. InternVL3-78B achieves SoTA performance in both perception and reasoning performance among open-source MLLMs. The key designs of InternVL3-78B include Variable Visual Position Encoding, Native Multimodal Pre-Training, Mixed Preference Optimization, and Multimodal Test-Time Scaling.2025/03/13: We introduce VisualPRM, an advanced multimodal Process Reward Model (PRM) with 8B parameters, which improves the overall reasoning performance of InternVL2.5-8B and InternVL2.5-78B by 8.4 and 5.9 points, respectively. The training data for this model, termed VisualPRM400K, is also open-sourced. Please refer to our paper and project page for more details.2024/12/20: We release the InternVL2.5-MPO, which is finetuned with Mixed Preference Optimization on MMPR-v1.1. The resulting models outperform their counterparts without MPO by an average of 2 points across all model scales on the OpenCompass leaderboard. These models are available at HF link.2024/12/17: InternVL2/2.5 is supported in PaddleMIX by Paddle Team.2024/12/05: We release the InternVL2.5, an advanced multimodal large language model (MLLM) series with parameter coverage ranging from 1B to 78B. InternVL2_5-78B is the first open-source MLLMs to achieve over 70% on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o. These models are available at HF link.2024/11/14: We introduce MMPR, a high-quality, large-scale multimodal reasoning preference dataset, and MPO, an effective preference optimization algorithm. The resulting model, InternVL2-8B-MPO, achieves an accuracy of 67.0 on MathVista. Please refer to our paper, project page and document for more details.
2024/10/21: We release the Mini-InternVL series. These models achieve impressive performance with minimal size: the 4B model achieves 90% of the performance with just 5% of the model size. For more details, please check our project page and document.2024/08/01: The Chartmimic team evaluated the InternVL2 series models on their benchmark. The InternVL2-26B and 76B models achieved the top two performances among open-source models, with the InternVL2 76B model surpassing GeminiProVision and exhibiting comparable results to Claude-3-opus.2024/08/01: InternVL2-Pro achieved the SOTA performance among open-source models on the CharXiv dataset, surpassing many closed-source models such as GPT-4V, Gemini 1.5 Flash, and Claude 3 Sonnet.2024/07/24: The MLVU team evaluated InternVL-1.5 on their benchmark. The average performance on the multiple-choice task was 50.4%, while the performance on the generative tasks was 4.02. The performance on the multiple-choice task ranked #1 among all open-source MLLMs.2024/07/04: We release the InternVL2 series. InternVL2-Pro achieved a 62.0% accuracy on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o.2024/07/18: InternVL2-40B achieved SOTA performance among open-source models on the Video-MME dataset, scoring 61.2 when inputting 16 frames and 64.4 when inputting 32 frames. It significantly outperforms other open-source models and is the closest open-source model to GPT-4o mini.2024/07/18: InternVL2-Pro achieved the SOTA performance on the DocVQA and InfoVQA benchmarks.2024/06/19: We propose Needle In A Multimodal Haystack (MM-NIAH), the first benchmark designed t
