Results for "document-vector"

Claude Code Claude Desktop GitHub Copilot Cursor Windsurf Cline Zed JetBrains

📄SKILL.md 🤖CLAUDE.md ⚡Claude Commands 📐.cursorrules 📐Cursor Rules 🕹️AGENTS.md 🧬codex.md 🏄.windsurfrules 🔧.clinerules 🧑‍✈️Copilot Instructions

All Development Operations Data Product Marketing Customer Design Sales

296 skills found · Page 7 of 10

rigvedrs / RAGIndex

LlamaIndex Powered RAG for PDF, TXT and DOCX files with Tesseract OCR support, Semantic chunking, Document citations with direct page display, Advanced Caching and Duplicate Detection with Redis Vector DB

universal

chatbotlangchainllama-index+7

Updated 1mo ago

Bhanuprakashrathood03 / Chat With Pdfs Groq Chatbot

A high-performance Q&A chatbot that uses the Groq API and Llama 3 for real-time, context-aware answers from multiple PDF documents. This project leverages LangChain for orchestration, FAISS for local vector search, and Streamlit for an interactive user interface. Ideal for efficient and private document analysis.

universal

Updated 1mo ago

SLEEPYBQ / Survey RAG

Survey-RAG is a tool for processing academic survey PDF documents and extracting information using large language models. This tool utilizes vector databases and Retrieval-Augmented Generation (RAG) to efficiently extract structured information from multiple PDF files.

universal

Updated 8mo ago

Jai-Agarwal-04 / Sentiment Analysis With Insights

Sentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.

Seeed-Projects / RAG Based On Jetson

This project has implemented the RAG function on Jetson and supports TXT and PDF document formats. It uses MLC for 4-bit quantization of the Llama2-7b model, utilizes ChromaDB as the vector database, and connects these features with Llama_Index. I hope you like this project.

universal

chromadbjetsonllama-index+2

Updated 1mo ago

wasp-lang / Ask The Documents

[MOVED] Ask The Documents (Embeddings / RAG / ChatGPT) with Wasp & PG Vector

universal

Updated 2mo ago

pprados / Langchain Rag

Manage multiple vector for the same document.

universal

Updated 29d ago

Azure-Samples / App Service Rag Openai AI Search Dotnet

A Blazor Server app demonstrating Retrieval Augmented Generation (RAG) with Azure OpenAI and AI Search. Chat with your documents using hybrid search (vector + keyword + semantic ranking). Features managed identity security and one-command deployment via Azure Developer CLI.

universal

Updated 1mo ago

liquidcarbon / Affinity

Typed, annotated vectors for well-documented datasets

universal

Updated 2mo ago

Abdulraqib20 / Agentic RAG With Gemini 2.0 Flash

An intelligent RAG system powered by Google's Gemini 2.0 Flash Thinking, Qdrant vector storage, and Agno agent orchestration. Upload documents, process web pages, and get AI-assisted answers with advanced query rewriting and web search capabilities.

gemini cli

Updated 9mo ago

LOH-puzik / LegalEase AI

LegalEaseAI simplifies legal topics with a document analyzer, legal counsel chatbot, and lawyer fee estimator. Powered by large language models, a vectorized database, and Flask.

zed

Updated 1mo ago

eswar-7116 / Vector Db Demo

A minimal Node.js demo showcasing how to use ChromaDB as a local vector database. It stores a set of sample documents and performs a semantic search query using natural language. Perfect for understanding the basics of vector search and how embeddings can be used to find meaning-based matches in text.

universal

chromadbdemovector-database

Updated 2mo ago

lailikanabila / ANALISIS SENTIMEN PADA PERPINDAHAN IBUKOTA INDONESIA DENGAN ALGORITMA SUPPORT VECTOR MACHINE

Analysis of the opinions expressed on Twitter regarding the relocation of Indonesia's capital city using combination of algorithm classifiers Support Vector Machine (SVM), Feature Selection Term Frequency Inverse Document (TF-IDF), and Bag of Words, and also using a Lexicon-based approach for labeling data as positive or negative sentiment

universal

Updated 2mo ago

juyeonnn / HEAVEN

Official Repository of "Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy"

universal

Updated 5d ago

manwar / SVG

Perl extension for generating Scalable Vector Graphics documents.

universal

Updated 1mo ago

nishiba / Scdv

SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

universal

Updated 2y ago

manan-paneri-99 / Vector Space Based Document Retrieval System

Retrieves the top 10 documents from the Wikipedia corpus for a user inputted free-text query

universal

document-retrievalinformation-retrievalvector-space-model

Updated 1y ago

Azure-Samples / App Service Rag Openai AI Search Python

A Python app demonstrating Retrieval Augmented Generation (RAG) with App Service, Azure OpenAI, and AI Search. Chat with your documents using hybrid search (vector + keyword + semantic ranking).

universal

Updated 1mo ago

GlobeletJS / Tile Painter

Canvas 2D rendering for vector maps, guided by a Mapbox style document

universal

Updated 10mo ago

trrt-good / 3d Math Java

A 3d math package in java with vector, matrix and quaternion methods well Documented

universal

3d3d-math-library3d-mathematical-functions+9

Updated 1y ago