Jarrow
Lightweight java Feather format I/O library
Install / Use
/learn @mbtaylor/JarrowREADME
JARROW
Overview
Jarrow is a lightweight java implementation for I/O of data stored in formats related to Apache Arrow. Currently, it only has support for the Arrow-related Feather format, but it may in future grow support for the Arrow IPC File format or other evolutions of Feather. Or it may not.
Comparison with the Apache Java Arrow Implementation
Why write this when there's already a Java implementation of Feather I/O provided by Apache? I wanted something without all those dependencies, and for which I had full control over the data access. I'm using it to provide Feather table I/O handlers in STIL/TOPCAT.
This library probably does less clever stuff than the Apache one but it's much more compact and has no external dependencies.
Building
If you want the library, the best thing is just to pick up the
pre-built jarrow.jar file from the
release.
However, if you want to build it from source, there's a makefile. It may need editing since some targets contains references to directories you don't have. But basically to build the library you just need to run javac on all the java files.
The source file is Java 1.6 compatible, and the distributed jarrow.jar file contains Java 1.6-compatible classes.
Implementation Status
Only feather files are currently supported. All feather files can be read, but currently the following column types are not fully supported on input:
- CATEGORY: I haven't come across any feather files with category column types, and it's not clear to me how to interpret the feather format documentation for this type, so it's not supported.
- UINT64: There's no java primitive or primitive-wrapper type that can represent unsigned 64-bit integers, so it's are not supported.
- TIMESTAMP, DATE, TIME: These values can be read, but the type-specific metadata/unit information is not currently available.
The reading is implemented using memory mapping (MappedByteBuffers).
The LARGE_UTF8 and LARGE_BINARY types defined in the
Arrow
but not in the
Feather
version of the flatbuffers metadata file are supported.
Implementation notes
The flatbuffers java source files are generated by running the flatc compiler from Google Flatbuffers version 1.11.0 on the Arrow version of feather.fbs. I subsequently moved the generated source files into a different java package to avoid possible namespace clashes with external code that may use a different version of flatbuffers.
Usage
Comprehensive documentation is provided in the javadocs.
The classes in the package uk.ac.bristol.star.feather form the
usable parts of the I/O library. The classes in the
uk.ac.bristol.star.fbs.* packages are flatbuffer support files
that you shouldn't need to use.
To read a table, you can use FeatherTable.fromFile(File) method;
examples in FeatherTable.main.
To write a table, use FeatherWriter.write(OutputStream);
this requires you to implement some FeatherColumnWriter objects
in some way appropriate to the data structures in which your table
data resides; there are examples in FeatherWriter.main.
Support and future development
I don't know whether anybody else will want to use this package. If you do, and if you are interested in features that are not currently present, please contact me (@mbtaylor).
Licence
This library includes google flatbuffers code which is licenced under the Apache 2.0 licence. I'm prepared to offer any licence to the original parts of this project that suits you and that's legally possible. For now, I assert that it's licenced under the LGPL. Unless somebody tells me I'm not allowed to do that.
History
- Version 1.0 (27 Feb 2020): Initial release
Related Skills
node-connect
348.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
348.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
348.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
