Anemone
Anemone web-spider framework
Install / Use
/learn @chriskite/AnemoneREADME
= Anemone
Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.
See http://anemone.rubyforge.org for more information.
== Features
- Multi-threaded design for high performance
- Tracks 301 HTTP redirects
- Built-in BFS algorithm for determining page depth
- Allows exclusion of URLs based on regular expressions
- Choose the links to follow on each page with focus_crawl()
- HTTPS support
- Records response time for each page
- CLI program can list all pages in a domain, calculate page depths, and more
- Obey robots.txt
- In-memory or persistent storage of pages during crawl, using TokyoCabinet, SQLite3, MongoDB, or Redis
== Examples See the scripts under the <tt>lib/anemone/cli</tt> directory for examples of several useful Anemone tasks.
== Requirements
- nokogiri
- robots
== Development To test and develop this gem, additional requirements are:
- rspec
- fakeweb
- tokyocabinet
- kyotocabinet-ruby
- mongo
- redis
- sqlite3
You will need to have KyotoCabinet, {Tokyo Cabinet}[http://fallabs.com/tokyocabinet/], {MongoDB}[http://www.mongodb.org/], and {Redis}[http://code.google.com/p/redis/] installed on your system and running.
Related Skills
node-connect
347.0kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
347.0kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
347.0kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
