89 skills found · Page 1 of 3
Unstructured-IO / UnstructuredConvert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
tjmlabs / ColiVaraColivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has state of the art retrieval performance on both text and visual documents. using vision models instead of chunking and text-processing for documents. No OCR, no text extraction, no broken tables, or missing images.
andrea9293 / MCP Documentation ServerMCP Documentation Server - Bridge the AI Knowledge Gap. ✨ Features: Document management • Gemini integration • AI-powered semantic search • File uploads • Smart chunking • Multilingual support • Zero-setup 🎯 Perfect for: New frameworks • API docs • Internal guides
Azure / Gpt Rag IngestionThe GPT-RAG Data Ingestion service automates processing of diverse documents—PDFs, images, spreadsheets, transcripts, and SharePoint—readying them for Azure AI Search. It applies smart chunking, generates text and image embeddings, and enables rich, multimodal retrieval.
jparkerweb / Semantic Chunking🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
curiousily / RagbaseCompletely local RAG. Chat with your PDF documents (with open LLM) and UI to that uses LangChain, Streamlit, Ollama (Llama 3.1), Qdrant and advanced methods like reranking and semantic chunking.
aws-samples / Layout Aware Document Processing And Retrieval Augmented GenerationAdvanced document extraction and chunking techniques for retrieval augmented generation that is aware of the layout of documents. Increases knowledge retrieval accuracy and provides control for retrieved knowledge context management
messkan / Rag ChunkA Python CLI to test, benchmark, and find the best RAG chunking strategy for your Markdown documents.
ALucek / Chunking StrategiesAn Overview of the Latest Document Chunking Research
drmingler / Smart Llm Loadersmart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.
anirudhtopiwala / OpenSource ProblemsThis repository is a mixture of different problems I have solved and want to document it. A majority of chunk would also be Leetcode Solutions.
speedyk-005 / Chunklet PyOne library to split them all: Sentence, Code, Docs. Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
GiovanniPasq / ChunkyConvert and validate your Markdown, then choose the best chunking strategy for your RAG pipeline.
lesteroliver911 / Contextual Doc Retrieval Opneai RerankerContextual Doc Retrieval is a Python-based system leveraging OpenAI GPT-4o and Cohere for re-ranking and query expansion, combined with BM25 for accurate document retrieval. It parses PDFs, chunks content contextually, and enhances search precision with AI-powered contextual understanding and re-ranking.
drittich / SemanticSlicer🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
balajivis / BrahmaSumm Community EditionBrahmaSumm is an advanced document summarization and visualization tool designed to streamline document management, knowledge base creation, and chatbot enhancement. By leveraging cutting-edge chunking and clustering techniques, BrahmaSumm reduces token usage sent to Large Language Models (LLMs) by up to 99%!!
zircote / Rlm RsRust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.
TianyiShi2001 / Rmarkdown VscodeThis extension provides a few snippets and key bindings for common tasks in .Rmd documents, such as inserting code chunks and including images using knitr::include_graphics(). Additionally, it aims to provide some helper functions for Bookdown and Blogdown.
Ayyodeji / Langchain LLM PDF QAThis open-source project leverages cutting-edge tools and methods to enable seamless interaction with PDF documents. Powered by Langchain, Chainlit, Chroma, and OpenAI, our application offers advanced natural language processing and retrieval augmented generation (RAG) capabilities. Upload PDF, app decodes, chunks, and stores embeddings for QA
wikit-ai / ChunknorrisChunkNorris is a black belt in document chunking to feed your LLMs and RAG apps 🥋🔪