Nutchpy
For interacting with nutch via Python
Install / Use
/learn @ContinuumIO/NutchpyREADME
Nutchpy
Introduction
Nutchpy is a Python library for working with Apache Nutch. In particular, the library provides functionality to work with existing Nutch data structures including various readers for the Nutch EcoSystem e.g. readers for Sequence Files, LinkDb, Nodes, etc. A small examples directory exists showing how Nutchpy can be used to interact with some of the above data strutures.
Install
To build nutchpy from source, run the following commands in your terminal:
git clone https://github.com/ContinuumIO/nutchpy.git
conda install -c blaze apache-maven
cd nutchpy; python setup.py install;
Alternatively, you can download nutchpy from binstar with conda:
conda install -c blaze nutchpy
Running
import nutchpy
node_path = "<FULL-PATH>/data"
seq_reader = nutchpy.sequence_reader
print(seq_reader.head(10,node_path))
print(seq_reader.slice(10,20,node_path))
Run Requirements
- JDK 1.6+
- python
- py4j
Build Requirements
- python
- apache-maven (
conda install -c blaze apache-maven)
Related Skills
node-connect
343.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
90.0kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
343.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
343.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
