302 skills found · Page 1 of 11
deepset-ai / HaystackOpen-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
deepseek-ai / JanusJanus-Series: Unified Multimodal Understanding and Generation Models
open-mmlab / MmagicOpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
VectorSpaceLab / OmniGen2OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
QwenLM / Qwen2.5 OmniQwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Tencent-Hunyuan / HunyuanImage 3.0HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
JIA-Lab-research / DreamOmni2This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
showlab / Show O[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
unum-cloud / UFormPocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Tencent-Hunyuan / HunyuanCustomHunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
yukkcat / Gemini Business2apiOpenAI-compatible API for Gemini Business with multi-account load balancing and multimodal capabilities (image/video generation, file parsing) | 将 Gemini Business 转为 OpenAI 兼容接口,支持多账户负载均衡及多模态能力(图像生成、视频生成、解析文件)
Tencent-Hunyuan / HunyuanVideo FoleyHunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
JackAILab / ConsistentID[TPAMI 2026] ConsistentID : Portrait Generation with Multimodal Fine-Grained Identity Preserving
TencentARC / SEED StorySEED-Story: Multimodal Long Story Generation with Large Language Model
eric-ai-lab / MiniGPT 5Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
GAIR-NLP / AnoleAnole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
mlfoundations / DatacompDataComp: In search of the next generation of multimodal datasets
1038lab / ComfyUI QwenVLComfyUI-QwenVL custom node: Integrates the Qwen-VL series, including Qwen2.5-VL and the latest Qwen3-VL, with GGUF support for advanced multimodal AI in text generation, image understanding, and video analysis.
stepfun-ai / NextStep 1[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
CY-CHENYUE / ComfyUI Janus ProComfyUI nodes for Janus-Pro, a unified multimodal understanding and generation framework.