Herringbone
Tools for working with parquet, impala, and hive
Install / Use
/learn @stripe-archive/HerringboneREADME
Herringbone
Herringbone is deprecated and is no longer being actively maintained.
Herringbone is a suite of tools for working with parquet files on hdfs, and with impala and hive.
The available commands are:
flatten: transform a directory of parquet files with a nested structure into a directory of parquet files with a flat schema that can be loaded into impala or hive (neither of which support nested schemas). Default output directory is /path/to/input/directory-flat.
$ herringbone flatten -i /path/to/input/directory [-o /path/to/non/default/output/directory]
load: load a directory of parquet files (which must have a flat schema) into impala or hive (defaulting to impala). Use the --nocompute-stats option for faster loading into impala (but probably slower querying later on!)
$ herringbone load [--hive] [-u] [--nocompute-stats] -d db_name -t table -p /path/to/parquet/directory
tsv: transform a directory of parquet files into a directory of tsv files (which you can concat properly later with hadoop fs -getmerge /path/to/tsvs). Default output directory is /path/to/input/directory-tsv.
$ herringbone tsv -i /path/to/input/directory [-o /path/to/non/default/output/directory]
compact: transform a directory of parquet files into a directory of fewer larger parquet files. Default output directory is /path/to/input/directory-compact.
$ herringbone compact -i /path/to/input/directory [-o /path/to/non/default/output/directory]
See herringbone COMMAND --help for more information on a specific command.
Building
You'll need thrift 0.9.1 on your path.
$ git clone github.com/stripe/herringbone
$ cd herringbone
$ mvn package
Authors
Related Skills
node-connect
346.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
107.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
346.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
346.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
