393 skills found · Page 1 of 14
bytedance / UI TARS DesktopThe Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra
deepset-ai / HaystackOpen-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
jina-ai / Serve☁️ Build multimodal AI applications with cloud-native stack
NVIDIA-NeMo / NeMoA scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
duixcom / Duix Avatar🚀 Truly open-source AI avatar(digital human) toolkit for offline video generation and digital human cloning.
pipecat-ai / PipecatOpen Source framework for voice and multimodal conversational AI
lancedb / LancedbDeveloper-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
gorse-io / GorseAI powered open source recommender system engine supports classical/LLM rankers and multimodal content via embedding
activeloopai / DeeplakeDeeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
open-mmlab / MmagicOpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.
lance-format / LanceOpen Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
facebookresearch / MmfA modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
Eventual-Inc / DaftHigh-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
NVlabs / VILAVILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
OpenGVLab / InternGPTInternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)
SkyworkAI / Skywork R1VSkywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
SamurAIGPT / Generative Media SkillsMulti-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image, video, and audio generation powered by muapi.ai.
jonyzhang2023 / Awesome Embodied Vla Va VlnA curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
microsoft / Magma[CVPR 2025] Magma: A Foundation Model for Multimodal AI Agents
qingchencloud / Clawpanel🦞 OpenClaw 可视化管理面板 — 内置 AI 助手(工具调用 + 图片识别 + 多模态),一键安装 | Visual management panel with built-in AI assistant (tool calling + vision + multimodal + i18n(11))