neuPrintHTTP

Implements a connectomics REST interface that leverages the neuprint data model. neuPrintHTTP can be run in a user authenticated mode or without any authentication. Note: that the authenticated mode (which requires more configuration and setup) is needed to use with neuPrintExplorer web application. The un-authenticated mode is the ideal way to access the neuPrint data programmatically.

Dependencies

Since neuPrint is written in golang, you will need to download and install golang before you can build and run neuPrintHTTP. The build tools for golang are opinionated about the file structure and location of golang projects, but by default the tools will autogenerate the required folders when you go get a project.

Installing

Go must be installed (version 1.16+). neuPrintHTTP supports both file-based logging and Apache Kafka. For basic installation:

Option 1: Clone and build (recommended)

# Clone the repository
git clone https://github.com/connectome-neuprint/neuPrintHTTP.git
cd neuPrintHTTP

# Build the application
go build

# Or install it to your GOPATH's bin directory
go install

Option 2: Direct install (requires Go modules)

# Install the latest version
go install github.com/connectome-neuprint/neuPrintHTTP@latest

To run tests:

% go test ./...

To test a specific package:

% go test ./api/...

neuprintHTTP uses a python script to support cell type analysis. To use this script, install scipy, scikit-learn, and pandas and make sure to run neuprint HTTP in the top directory where the python script is located.

Data Access Endpoints

Standard JSON Endpoint

The default endpoint for custom queries is /api/custom/custom, which returns results in JSON format:

curl -X POST "http://localhost:11000/api/custom/custom" \
  -H "Content-Type: application/json" \
  -d '{"cypher": "MATCH (n) RETURN n LIMIT 1", "dataset": "hemibrain"}'

The response will be JSON with this structure:

{
  "columns": ["name", "size"],
  "data": [["t4", 323131], ["mi1", 232323]]
}

Where:

columns: Array of column names from your query
data: Array of rows, each row containing values that correspond to the columns

Apache Arrow Support

neuPrintHTTP supports returning query results in Apache Arrow format via the /api/custom/arrow HTTP endpoint. This provides several advantages:

Efficient binary serialization with low overhead
Preservation of data types
Native integration with data science tools
Optimized memory layout for analytical workloads

neuPrintHTTP uses Arrow v18 for all Arrow-related functionality, including both the HTTP IPC stream format and the preliminary Flight implementation.

Using the Arrow Endpoint

To retrieve data in Arrow format, send a POST request to /api/custom/arrow with the same JSON body format as the regular custom endpoint:

curl -X POST "http://localhost:11000/api/custom/arrow" \
  -H "Content-Type: application/json" \
  -d '{"cypher": "MATCH (n) RETURN n LIMIT 1", "dataset": "hemibrain"}' \
  --output data.arrow

The response will be in Arrow IPC stream format with content type application/vnd.apache.arrow.stream. This is a standard way to transfer Arrow data over HTTP without requiring gRPC or Arrow Flight.

You can parse this with Arrow libraries available in multiple languages:

# Python example - Standard HTTP with Arrow IPC format (No Flight required)
import os
import pyarrow as pa
import requests

# Get token from environment variable. Token can be found in neuPrintExplorer settings.
# Only necessary if authentication is turned on.
token = os.environ.get("NEUPRINT_APPLICATION_CREDENTIALS")

# Add the token to the headers
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {token}"
}

resp = requests.post('http://localhost:11000/api/custom/arrow', 
                    headers=headers,
                    json={"cypher": "MATCH (n) RETURN n LIMIT 1", 
                          "dataset": "hemibrain"})

# Parse the Arrow IPC stream from the HTTP response
reader = pa.ipc.open_stream(pa.py_buffer(resp.content))
table = reader.read_all()

# Convert to pandas DataFrame
df = table.to_pandas()

# For Neo4j node objects (which are represented as Arrow Maps)
# we need a helper function to convert Arrow MapValue objects to Python dictionaries
def convert_mapvalue_to_dict(val):
    if hasattr(pa.lib, 'MapValue') and isinstance(val, pa.lib.MapValue):
        return {k.as_py(): v.as_py() for k, v in val.items()}
    return val

# Process Map columns in the DataFrame
for col in df.columns:
    if hasattr(pa.lib, 'MapValue'):
        # Use the MapValue approach if available
        df[col] = df[col].map(lambda x: convert_mapvalue_to_dict(x) if x is not None else None)

print(df)

// JavaScript example with Arrow JS
const response = await fetch('http://localhost:11000/api/custom/arrow', {
  method: 'POST',
  headers: {'Content-Type': 'application/json'},
  body: JSON.stringify({
    cypher: "MATCH (n) RETURN n LIMIT 1",
    dataset: "hemibrain"
  })
});

// Get the binary data
const arrayBuffer = await response.arrayBuffer();
// Parse the Arrow IPC stream
const table = await arrow.tableFromIPC(arrayBuffer);
console.log(table.toString());

developers

If modifying the source code and updating the swagger inline comments, update the documentation with:

% go generate

using Apache Kafka for logging

To use Kafka for logging, one must install librdkafka and build neuprint http with the kafka option.

See installation instructions for librdkafka.

And then:

% go install -tags kafka

Installing without kafka support

If you are having trouble building the server, because librdkafka is missing and you don't need to send log messages to a kafka server, then try this build.

%  go get -tags nokafka github.com/connectome-neuprint/neuPrintHTTP

Running

% neuPrintHTTP -port |PORTNUM| config.json

The config file should contain information on the backend datastore that satisfies the connectomics REST API and the location for a file containing a list of authorized users. To test https locally and generate the necessary certificates, run:

% go run $GOROOT/src/crypto/tls/generate_cert.go --host localhost

Command Line Options

Usage: neuprintHTTP [OPTIONS] CONFIG.json
  -port int
        port to start server (default 11000)
  -arrow-flight-port int
        port for Arrow Flight gRPC server (default 11001)
  -disable-arrow
        disable Arrow format support (enabled by default)
  -public_read
        allow all users read access
  -pid-file string
        file for pid
  -verbose
        verbose mode

Configuration

The server is configured using a JSON file. The configuration specifies database connections, authentication options, and other server settings.

Apache Arrow Configuration

The Arrow support in neuPrintHTTP includes:

Arrow IPC HTTP endpoint: Available at /api/custom/arrow on the main HTTP port
Arrow Flight gRPC server: Runs on a separate port (default: 11001)

To change the Arrow Flight port:

# Start with custom Flight port
neuprintHTTP -arrow-flight-port 12345 config.json

To disable Arrow support entirely:

# Disable all Arrow functionality
neuprintHTTP -disable-arrow config.json

Sample Configuration

A sample configuration file can be found in config-examples/config.json in this repo:

{
    "engine": "neuPrint-bolt",
    "engine-config": {
        "server": "<NEO4-SERVER>:7687",
        "user": "neo4j",
        "password": "<PASSWORD>"
    },
    "datatypes": {
        "skeletons": [
            {
                "instance": "<UNIQUE NAME>",
                "engine": "dvidkv",
                "engine-config": {
                    "dataset": "hemibrain",
                    "server": "http://<DVIDADDR>",
                    "branch": "<UUID>",
                    "instance": "segmentation_skeletons"
                }
            }
        ]
    },
    "disable-auth": true,
    "swagger-docs": "<NEUPRINT_HTTP_LOCATION>/swaggerdocs",
    "log-file": "log.json"
}

DatasetGateway (DSG) Authentication

When authentication is enabled ("disable-auth": false), neuPrintHTTP delegates all authentication and per-dataset authorization to a DatasetGateway instance. Users authenticate via DSG API keys (dsg_token), and per-dataset access is checked against DSG's permissions.

Add these fields to your config to enable DSG auth:

{
    "disable-auth": false,
    "dsg-url": "https://dsg.janelia.org",
    "dsg-cache-ttl": 300,
    "dsg-service-name": "neuprint",
    "dataset-map": {
        "vnc": "VNC",
        "manc": "MANC"
    },
    "ssl-cert": "/path/to/cert.pem",
    "ssl-key": "/path/to/key.pem",
    "hostname": "neuprint.example.com"
}

| Field | Required | Description | |-------|----------|-------------| | dsg-url | Yes (when auth enabled) | Base URL of the DatasetGateway service | | dsg-cache-ttl | No | Seconds to cache DSG user lookups (default: 300) | | dsg-service-name | No | Service name for TOS checks (default: "neuprint") | | dataset-map | No | Maps neuprint DB names to DSG dataset slugs when they differ (e.g., neuprint uses lowercase "vnc" but DSG uses "VNC") |

Note that the Bolt (optimized neo4j protocol) engine neupPrint-bolt is recommended while the older neuPrint-neo4j engine is deprecated. See below.

Neo4j Bolt Driver

neuPrintHTTP now supports the Neo4j Bolt protocol driver

NeuPrintHTTP

Install / Use

README