114 skills found · Page 1 of 4
labring / FastGPTFastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
tjmlabs / ColiVaraColivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual documents. using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images.
YehLi / XmodalerX-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
dermotte / LIREOpen source library for content based image retrieval / visual information retrieval.
naver / Deep Image RetrievalEnd-to-end learning of deep visual representations for image retrieval
Alibaba-NLP / ViDoRAG[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
khanhnamle1994 / Fashion RecommendationA clothing retrieval and visual recommendation model for fashion images.
LinWeizheDragon / Retrieval Augmented Visual Question AnsweringThis is the official repository for Retrieval Augmented Visual Question Answering
ZhangYuanhan-AI / Visual Prompt Retrieval[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
AnyLoc / Revisit AnythingCode release for Revisit Anything: Visual Place Recognition via Image Segment Retrieval (ECCV 2024)
gustavoeenriquez / MakerAiThe AI Operating System for Delphi. 100% native framework with RAG 2.0 for knowledge retrieval, autonomous agents with semantic memory, visual workflow orchestration, and universal LLM connector. Supports OpenAI, Claude, Gemini, Ollama, and more. Enterprise-grade AI for Delphi 10.3+
yalesong / PvsePolysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
nasa-jpl-memex / Image SpaceInteractive Image similarity and Visual Search and Retrieval application
athrael-soju / Snappy🐊 Snappy's unique approach unifies vision-language late interaction with structured OCR for region-level knowledge retrieval. Like the project? Drop a star! ⭐
li-xiu-qi / Smartlmager一个基于多模态向量模型及视觉多模态模型构建的图片搜索引擎&管理系统,实现精准的以文搜文,文搜图、以图搜图多种智能检索方式。An image search engine management system built upon multimodal vector models and visual multimodal models, implementing multiple intelligent search methods including precise text-to-text, text-to-image, and image-to-image retrieval.
danieljf24 / W2vvWord2VisualVec : Predicting Visual Features from Text for Image and Video Caption Retrieval
caoyue10 / Cvpr17 DvsqThe implementation of CVPR-17 paper "Deep Visual-Semantic Quantization of Efficient Image Retrieval"
Jiaxuan-Li / EVCap[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
HawkinsT / Pathfinder.nvimA Neovim plugin which enhances gf/gF/gx with look-ahead and smarter file, line/column number, and link resolution. Also provides visual targets for files/links, new motion commands, and link description retrieval.
gurkandemir / Bag Of Visual WordsBag of visual words (BOVW) is commonly used in image classification. Its concept is adapted from information retrieval and NLP’s bag of words (BOW).