77 skills found · Page 1 of 3
adithya-s-k / OmniparseIngest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Micke-K / IntuneManagementCopy, export, import, delete, document and compare policies and profiles in Intune and Azure with PowerShell script and WPF UI. Import ADMX files and registry settings with ADMX ingestion. View and edit PowerShell script.
opensemanticsearch / Open Semantic EtlPython based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Azure / Gpt Rag IngestionThe GPT-RAG Data Ingestion service automates processing of diverse documents—PDFs, images, spreadsheets, transcripts, and SharePoint—readying them for Azure AI Search. It applies smart chunking, generates text and image embeddings, and enables rich, multimodal retrieval.
newsdev / Stevedoresearch document dumps: ingest and explore in one extensible framework
lixx21 / Legal Document AssistantA Retrieval-Augmented Generation (RAG) application for querying legal documents. It uses PostgreSQL, Elasticsearch, and LLM to provide summaries and suggestions based on user queries. Features data ingestion with Airflow, real-time monitoring with Grafana, and a Streamlit interface.
alephdata / Ingest FileIngestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
ametnes / NesisYour AI Powered Enterprise Knowledge Partner. Designed to be used at scale from ingesting large amounts of documents formats such as pdfs, docx, xlsx, png, jpgs, tiff, mp3, mp4, jpeg. Integrates with s3, Windows Shares, Google Drive and more.
deepak223098 / Long Term Stock Price Growth Prediction Using NLP On 10 K Financial ReportsA 10-K FInancial Report is a comprehensive report which must be filed annually by all publicly traded companies about its financial performance. These reports are filed to the US Securities Exchange Commission (SEC). This is even more detailed than the annual report of a company. The 10K documents contain information about the Business' operations, risk factors, selected financial data, the Management's discussion and analysis (MD&A) and also Financial Statements and supplementary data. I have been expected to build an NLP pipeline that ingests 10-K reports of various publicly traded companies and build a machine learning model which can uncover the hidden signals to predict the long term stock performance of a company from the 10-K docs using the ‘Loughran McDonald Master Dictionary’. The Dictionary contain words that are specifically curated in the context of financial reports
sjafferali / Paperless Titles From AIA paperless-ngx postconsume script that automatically generates meaningful titles of ingested documents using openai or other llm providers such as ollama.
aws-solutions / Enhanced Document Understanding On AwsEnhanced Document Understanding on AWS delivers an easy-to-use web application that ingests and analyzes documents, extracts content, identifies and redacts sensitive customer information, and creates search indexes from the analyzed data.
prrao87 / Db Hub FastapiAsync bulk data ingestion and querying in various document, graph and vector databases via their Python clients
feiyu1104 / DUTMed基于 Neo4j 知识图谱与通义千问 LLM 的多模态医学问答系统,支持多跳推理、医学影像分割与分析、文档自动入库,提供 Web 界面与命令行两种交互模式。A multimodal medical QA system powered by Neo4j knowledge graph and LLM, supporting multi-hop reasoning, medical image segmentation & analysis, and automated document ingestion. Web UI & CLI included.
Azure / Document Vector PipelinePipeline for ingesting documents (like pdfs and docx) into a searchable Azure Database for vector and hybrid searching.
giusedroid / Serverless Embeddings Lancedb BedrockThis is an example of serverless document ingestion pipeline that automates the calculation of embeddings, so that they can be used in the context of a Retrieval Augmented Generation application. This sample makes use of Amazon Bedrock to provide access to Amazon Titan Embedding model and LanceDB
nsoft / JesterjDocument Ingestion Framework for Search Systems
jamietso / Tabular ReviewAn AI-powered tabular review tool for legal professionals. Ingest unstructured documents, define dynamic extraction columns, and query your data with an integrated analyst chat.
open-politics / Open Politics HqOpen Source Intelligence as infrastructure: turn domain expertise into structured insights at scale: flexibly ingest content, define analytical frameworks in natural language, visualize patterns across documents, geography, and time. For researchers, journalists, NGOs, and everyone else that needs a new headquarter for the digital age.
Azure-Samples / Rag As A Service With VisionThis repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.
eventuallyconsultant / CodegenrFast handlebars templates based code generator, ingesting swagger/openapi and other json/yaml documents with $refs, or graphql schema, outputs whatever you template