19 skills found
chrismattmann / Tika PythonTika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
apache / Tika DockerConvenience Docker images for Apache Tika Server
LogicalSpark / Docker TikaserverApache Tika Server as a Docker Image
nasa-jpl-memex / GeoParserExtract and Visualize location from any file
tspannhw / OpenSourceComputerVisionOpen Source Computer Vision with TensorFlow, MiniFi, Apache NiFi, OpenCV, Apache Tika and Python For processing images from IoT devices like Raspberry Pis, NVidia Jetson TX1, NanoPi Duos and more which are equipped with attached cameras or external USB webcams, we use Python to interface via OpenCV and PiCamera. From there we run image processing at the edge on these IoT device using OpenCV and TensorFlow to determine attributes and image analytics. A pache MiniFi coordinates running these Python scripts and decides when and what to send from that analysis and the image to a remote Apache NiFi server for additional processing. At the Apache NiFi cluster in the cluster it routes the images to one processing path and the JSON encoded metadata to another flow. The JSON data (with it's schema referenced from a central Schema Registry) is routed and routed using Record Processing and SQL, this data in enriched and augment before conversion to AVRO to be send via Apache Kafka to SAM. Streaming Analytics Manager then does deeper processing on this stream and others including weather and twitter to determine what should be done on this data. References https://community.hortonworks.com/articles/103863/using-an-asus-tinkerboard-with-tensorflow-and-pyth.html https://community.hortonworks.com/articles/118132/minifi-capturing-converting-tensorflow-inception-t.html https://github.com/tspannhw/rpi-noir-screen https://community.hortonworks.com/articles/77988/ingest-remote-camera-images-from-raspberry-pi-via.html https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html https://community.hortonworks.com/articles/58265/analyzing-images-in-hdf-20-using-tensorflow.html
sergio11 / Document Search Engine Architecture📄🚀 Unleash a powerful Document Search Engine with Apache NiFi for lightning-fast, comprehensive text indexing and search.
LexPredict / Tika ServerApache Tika Server with Tesseract 4 Docker Setup
stumpylog / Tika ClientA modern Python REST client for Apache Tika server
gselva / Simple Tika ServerApache Tika as a http service, PUT files and get the metadata as JSON
rse / Tika ServerApache Tika Server as a Background Service in Node.js
mattflax / Dropwizard Tika ServerA DropWizard wrapper around Apache Tika.
fraponyo94 / Text Extraction Scanned PdfText extraction from scanned pdf documents in java
DFKI / LeechcrawlerIncremental crawling capabilities for Apache Tika. Crawl content out of e.g. file systems, http(s) sources (webcrawling) imap(s) servers or your own arbitrary data sources. LeechCrawler offers additional Tika parsers providing these crawling capabilities.
niyazed / WordifyIf you are too lazy to read the whole document then generate wordart and keywords.
mkalus / Tika Page ExtractorTika per page PDF extractor server returning content as JSON.
opensemanticsearch / Tika Server.debApache Tika Server as Debian GNU/Linux and Ubuntu Linux package
abhilesh / Apache Tika ArmDocker images to run Apache Tika server on armhf and arm64 systems
maxcom / Tikaserver ExJAX-RS Server for Apache Tika
opensemanticsearch / Tesseract Ocr CacheTesseract OCR wrapper for Apache Tika and/or Open Semantic ETL caching the OCR results, so Tika-Server or Open Semantic ETL has not to reprocess slow and expensive OCR on same images again