1,756 skills found · Page 1 of 59
google / LangextractA Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
jsvine / PdfplumberPlumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
kreuzberg-dev / KreuzbergA polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 91+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
deanmalmgren / Textractextract text from any document. no muss. no fuss.
webpack-contrib / Extract Text Webpack Plugin[DEPRECATED] Please use https://github.com/webpack-contrib/mini-css-extract-plugin Extracts text from a bundle into a separate file
attardi / WikiextractorA tool for extracting plain text from Wikipedia dumps
snipsco / Snips NluSnips Python library to extract meaning from text
apache / TikaThe Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
urchade / GLiNERGeneralist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
NVIDIA / NeMo RetrieverNeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Artikash / TextractorExtracts text from video games and visual novels. Highly extensible.
UglyToad / PdfPigRead and extract text and other content from PDFs in C# (port of PDFBox)
robertknight / OcrsRust library and CLI tool for OCR (extracting text from images)
microlinkhq / BrowserlessThe headless Chrome/Chromium driver on top of Puppeteer. Take screenshots, generate PDFs, extract text and HTML with a production-ready API.
dbashford / Textractnode.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
JonathanLink / PDFLayoutTextStripperConverts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
scito / Extract Otp SecretsExtract one time password (OTP) secrets from QR codes exported by two-factor authentication (2FA) apps such as "Google Authenticator". The exported QR codes from authentication apps can be captured by camera, read from images, or read from text files. The secrets can be exported to JSON or CSV, or printed as QR codes to console.
dmmiller612 / Bert Extractive SummarizerEasy to use extractive text summarization with BERT
SySS-Research / SethPerform a MitM attack and extract clear text credentials from RDP connections
0x09AL / RdpThiefExtracting Clear Text Passwords from mstsc.exe using API Hooking.