Tikaondotnet
Use the Java Tika text extraction library on the .NET platform
Install / Use
/learn @KevM/TikaondotnetREADME
Tika on .NET
This project is a simple wrapper around the very excellent and robust Tika text extraction Java library. This project produces two nugets:
- TikaOnDotNet - A straight IKVM hosted port of Java Tika project.
- TikaOnDotNet.TextExtractor - Use Tika to extract text from rich documents.
Getting Started
The best way to get started is to:
- Add a Nuget dependency to TikaOnDotNet.TextExtractor.
- Instantiate a new
TextExtractorobject and call one of theExtractmethods.
Usage
// using TikaOnDotNet.TextExtraction;
var textExtractor = new TextExtractor();
var wordDocContents = textExtractor.Extract(@".\path\to\my favorite word.docx");
var webPageContents = textExtractor.Extract(new Uri("https://google.com"));
Take a look at our tests for more usage examples.
How To Contribute
Have an idea to make this project better? Great! Start out by taking a look at our Contributing Guide.
Having A Problem?
Search in the Issues as your problem may be a common one. If don't find your problem please create an issue. Contributors here will chime in when they can.
Related Skills
node-connect
342.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
85.3kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
342.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
342.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。


