Crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
Install / Use
/learn @Norconex/CrawlersREADME
Norconex Crawlers
Norconex web and filesystem crawlers are full-featured crawlers (or spider) that can manipulate and store collected data in a repository of your choice (e.g., a search engine). They are very flexible, powerful, easy to extend, and portable. They can be used command-line with file-based configuration on any OS or embedded into Java applications using well-documented APIs.
Visit the website for binary downloads and documentation: https://opensource.norconex.com/crawlers/
Are you on the right branch?
This branch holds version 4 code, which is still in development.
For the latest stable release of Norconex Web Crawler, use the version 3 branch.
UPCOMING: Crawler V4 Stack
As of Feb 24, 2024, the default main branch holds code for the upcoming version 4 crawler stack. It is now a mono-repo containing all Norconex crawler-related projects previously maintained in separate repos. All projects in this mono report will now be released simultaneously and share the same version number.
Until v4 is officially released, this branch should not be considered stable.
Projects
| Folder | Artifact Id | Build |
| ---------------------------- | ------------------------------ | ------------- |
| crawler/core/ | nx-crawler-core test | |
| crawler/fs/ | nx-crawler-fs |
|
| crawler/web/ | nx-crawler-web |
|
| importer/ | nx-importer |
|
| committer/amazoncloudsearch/ | nx-committer-amazoncloudsearch |
|
| committer/apachekafka/ | nx-committer-apachekafka |
|
| committer/azurecognitivesearch/ | nx-committer-azurecognitivesearch |
|
| committer/core/ | nx-committer-core |
|
| committer/idol/ | nx-committer-idol |
|
| committer/elasticsearch/ | nx-committer-elasticsearch |
|
| committer/neo4j/ | nx-committer-neo4j |
|
| committer/solr/ | nx-committer-solr |
|
| committer/sql/ | nx-committer-sql |
|
All projects in this repository share the same Maven group id:
com.norconex.crawler
Related Skills
node-connect
349.7kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
349.7kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
349.7kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
