296 skills found · Page 7 of 10
rigvedrs / RAGIndexLlamaIndex Powered RAG for PDF, TXT and DOCX files with Tesseract OCR support, Semantic chunking, Document citations with direct page display, Advanced Caching and Duplicate Detection with Redis Vector DB
Bhanuprakashrathood03 / Chat With Pdfs Groq Chatbot A high-performance Q&A chatbot that uses the Groq API and Llama 3 for real-time, context-aware answers from multiple PDF documents. This project leverages LangChain for orchestration, FAISS for local vector search, and Streamlit for an interactive user interface. Ideal for efficient and private document analysis.
SLEEPYBQ / Survey RAGSurvey-RAG is a tool for processing academic survey PDF documents and extracting information using large language models. This tool utilizes vector databases and Retrieval-Augmented Generation (RAG) to efficiently extract structured information from multiple PDF files.
Jai-Agarwal-04 / Sentiment Analysis With InsightsSentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.
Seeed-Projects / RAG Based On JetsonThis project has implemented the RAG function on Jetson and supports TXT and PDF document formats. It uses MLC for 4-bit quantization of the Llama2-7b model, utilizes ChromaDB as the vector database, and connects these features with Llama_Index. I hope you like this project.
wasp-lang / Ask The Documents[MOVED] Ask The Documents (Embeddings / RAG / ChatGPT) with Wasp & PG Vector
pprados / Langchain RagManage multiple vector for the same document.
Azure-Samples / App Service Rag Openai AI Search DotnetA Blazor Server app demonstrating Retrieval Augmented Generation (RAG) with Azure OpenAI and AI Search. Chat with your documents using hybrid search (vector + keyword + semantic ranking). Features managed identity security and one-command deployment via Azure Developer CLI.
liquidcarbon / AffinityTyped, annotated vectors for well-documented datasets
Abdulraqib20 / Agentic RAG With Gemini 2.0 FlashAn intelligent RAG system powered by Google's Gemini 2.0 Flash Thinking, Qdrant vector storage, and Agno agent orchestration. Upload documents, process web pages, and get AI-assisted answers with advanced query rewriting and web search capabilities.
LOH-puzik / LegalEase AILegalEaseAI simplifies legal topics with a document analyzer, legal counsel chatbot, and lawyer fee estimator. Powered by large language models, a vectorized database, and Flask.
eswar-7116 / Vector Db DemoA minimal Node.js demo showcasing how to use ChromaDB as a local vector database. It stores a set of sample documents and performs a semantic search query using natural language. Perfect for understanding the basics of vector search and how embeddings can be used to find meaning-based matches in text.
lailikanabila / ANALISIS SENTIMEN PADA PERPINDAHAN IBUKOTA INDONESIA DENGAN ALGORITMA SUPPORT VECTOR MACHINEAnalysis of the opinions expressed on Twitter regarding the relocation of Indonesia's capital city using combination of algorithm classifiers Support Vector Machine (SVM), Feature Selection Term Frequency Inverse Document (TF-IDF), and Bag of Words, and also using a Lexicon-based approach for labeling data as positive or negative sentiment
juyeonnn / HEAVENOfficial Repository of "Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy"
manwar / SVGPerl extension for generating Scalable Vector Graphics documents.
nishiba / ScdvSCDV : Sparse Composite Document Vectors using soft clustering over distributional representations
manan-paneri-99 / Vector Space Based Document Retrieval SystemRetrieves the top 10 documents from the Wikipedia corpus for a user inputted free-text query
Azure-Samples / App Service Rag Openai AI Search PythonA Python app demonstrating Retrieval Augmented Generation (RAG) with App Service, Azure OpenAI, and AI Search. Chat with your documents using hybrid search (vector + keyword + semantic ranking).
GlobeletJS / Tile PainterCanvas 2D rendering for vector maps, guided by a Mapbox style document
trrt-good / 3d Math JavaA 3d math package in java with vector, matrix and quaternion methods well Documented