21 skills found
allenai / Mmc4MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
GAIR-NLP / AnoleAnole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
mlfoundations / MINT 1T🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
OpenGVLab / OmniCorpus[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
zjysteven / Lmms FinetuneA minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
ThinkMorph / ThinkMorph[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
SII-WenjieLisjtu / CX MindCX-Mind: A Pioneering Multimodal Large Language Model for Interleaved Reasoning in Chest X-ray via Curriculum-Guided Reinforcement Learning
chenllliang / DreamEngineMultimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think!
eric-ai-lab / DMLROfficial codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
yisuanwang / Idea23D[COLING 2025] Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
HKUST-LongGroup / CoMM[CVPR 2025 Highlight] Official repository for CoMM Dataset
rickyang1114 / Multimodal Deepresearcher[AAAI 2026] Multimodal Deepresearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
SihengLi99 / TextBind[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
showlab / MovieSeq[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
ByteDance-BandAI / LLM I🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code execution & editing
Lillianwei-h / MMIE[ICLR'25 Oral] MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Yuan-ManX / ComfyUI BagelComfyUI-Bagel is now available in ComfyUI, BAGEL is an open‑source multimodal foundation model with 7B active parameters (14B total) trained on large‑scale interleaved multimodal data.
ZJU4HealthCare / TumorChain【ICLR 2026】Official Repo for Paper ‘’TumorChain: Interleaved Multimodal Chain-of-Thought Reasoning for Traceable Clinical Tumor Analysis‘’
ant-research / UniAD[CVPR'25] Official implementation for paper - Contextual AD Narration with Interleaved Multimodal Sequence
finyorko / ARMORARMOR: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy