SkillAgentSearch skills...

Nutchpy

For interacting with nutch via Python

Install / Use

/learn @ContinuumIO/Nutchpy
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Nutchpy

Introduction

Nutchpy is a Python library for working with Apache Nutch. In particular, the library provides functionality to work with existing Nutch data structures including various readers for the Nutch EcoSystem e.g. readers for Sequence Files, LinkDb, Nodes, etc. A small examples directory exists showing how Nutchpy can be used to interact with some of the above data strutures.

Install

To build nutchpy from source, run the following commands in your terminal:

  git clone https://github.com/ContinuumIO/nutchpy.git
  conda install -c blaze apache-maven
  cd nutchpy; python setup.py install;

Alternatively, you can download nutchpy from binstar with conda:

  conda install -c blaze nutchpy

Running

import nutchpy

node_path = "<FULL-PATH>/data"
seq_reader = nutchpy.sequence_reader
print(seq_reader.head(10,node_path))
print(seq_reader.slice(10,20,node_path))

Run Requirements

  • JDK 1.6+
  • python
  • py4j

Build Requirements

  • python
  • apache-maven (conda install -c blaze apache-maven)

Related Skills

View on GitHub
GitHub Stars29
CategoryDevelopment
Updated5mo ago
Forks16

Languages

Java

Security Score

87/100

Audited on Oct 11, 2025

No findings