156 skills found · Page 1 of 6
deepset-ai / HaystackOpen-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
lancedb / LancedbDeveloper-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.
activeloopai / DeeplakeDeeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
Alibaba-NLP / VRAGMultimodal Retrieval-augmented Generation Framework Built by Tongyi Lab, Alibaba Group.
llm-lab-org / Multimodal RAG SurveyA Survey on Multimodal Retrieval-Augmented Generation
adithya-s-k / VARAGVision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
henrydaum / Second BrainSecond Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented generation (RAG), multimodal AI models, and a hybrid lexical/semantic search algorithm to interact with local text files and images.
Alibaba-NLP / OmniSearchRepo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
xid32 / NAACL 2025 TWMWe introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
jolibrain / ColetteMultimodal RAG to search and interact locally with technical documents of any kind
VectorSpaceLab / MegaPairs[ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval
Code-kunkun / LamRA[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
Azure / Gpt Rag IngestionThe GPT-RAG Data Ingestion service automates processing of diverse documents—PDFs, images, spreadsheets, transcripts, and SharePoint—readying them for Azure AI Search. It applies smart chunking, generates text and image embeddings, and enables rich, multimodal retrieval.
cap-ntu / Video To Retail PlatformAn intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
ocean-luna / HMRAG[ACM MM2025] Official code of " HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation"
DataArcTech / RagVLOfficial PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training.
parsee-ai / Parsee CoreRetrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and HTML files. Extensive support of tabular data extraction and multimodal queries.
li-xiu-qi / Smartlmager一个基于多模态向量模型及视觉多模态模型构建的图片搜索引擎&管理系统,实现精准的以文搜文,文搜图、以图搜图多种智能检索方式。An image search engine management system built upon multimodal vector models and visual multimodal models, implementing multiple intelligent search methods including precise text-to-text, text-to-image, and image-to-image retrieval.
Azure-Samples / Multimodal Rag Code ExecutionA multimodal Retrieval Augmented Generation with code execution capabilities. Process multiple complex documents with images, table, charts to distill insights or generate new documents.
niluthpol / Multimodal VttJoint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval