61 skills found · Page 1 of 3
vllm-project / Vllm OmniA framework for efficient model inference with omni-modality models
QwenLM / Qwen2.5 OmniQwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
ictnlp / LLaMA OmniLLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
VITA-MLLM / VITA✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
A real-time interactive Omni Avatar built on LiveKit, which allows you to seamlessly integrate with any open source Avatar components (real-time model, visual, voice, memory, search, etc.).
Ola-Omni / OlaOla: Pushing the Frontiers of Omni-Modal Language Model
VITA-MLLM / Freeze Omni✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
CASIA-IVA-Lab / VALOR[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
CASIA-IVA-Lab / VAST[NIPS2023] Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
opendilab / LightRFTLightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Framework
yunxiangfu2001 / SegMAN[CVPR 2025] SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
MooreThreads / MooERMooER: Moore-threads Open Omni model for speech-to-speech intERaction. MooER-omni includes a series of end-to-end speech interaction models along with training and inference code, covering but not limited to end-to-end speech interaction, end-to-end speech translation and speech recognition.
SOTAMak1r / VINO CodeA Unified Visual Generator with Interleaved OmniModal Context
sgl-project / Sglang OmniSGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
ddlBoJack / Omni Captioner[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
THU-BPM / Omni SafetyBenchCode for paper "Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models".
bytedance / OmniScient ModelThis repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
Zplusdragon / ReID5o ORBench[NeurIPS2025] ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model
meituan-longcat / UNO BenchOmni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and actively seeking partners to co-build English & multilingual versions! Let’s expand global impact together.
maxencefaldor / Omni EpicOMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).