Dataframe

Another dataframe library for Java, inspired by Tablesaw, built on nio buffers

Generate Convert Improve

Install / Use

/learn @biteytech/Dataframe

About this skill

Quality Score

0/100

README

dataframe

Another dataframe library for Java, inspired by Tablesaw, built on nio buffers.

To add a dependency on dataframe using Maven, use the following:

<dependency>
  <groupId>tech.bitey</groupId>
  <artifactId>dataframe</artifactId>
  <version>1.2.11</version>
</dependency>

Requires Java 17 or higher. The last version supporting Java 11 was 1.1.7.

What's different about this dataframe library?

It's geared towards making it easier to ship around tabular data for Java backend developers - rather than for data science. This is not Pandas for Java.
Data is stored in ByteBuffers, so the dataframes can read/write to Channels with minimal overhead (save to files, send over network).
Optimized for space. For example, booleans take one bit each, DateTimes take one long (with microsecond precision).
Nulls are stored in a separate bitset (also backed by ByteBuffer), taking up two bits per Column length. No extra space is used if all values are non-null.

Features

Supports the most common types: String, int, long, short, byte, boolean, double, float, Date, DateTime, and BigDecimal; as well as Time, UUID, Instant, and InputStream.
Column and DataFrame are immutable. Columns can be created from collections, arrays, streams, or with builders.
Read/write to File or Channel with minimal overhead. Supports memory-mapping from a file.
Read/write CSV files
Read from ResultSet, write with PreparedStatement
Backing ByteBuffers can be on heap or off as a global property: -Dtech.bitey.allocateDirect=true or false, defaults to false
Column implements List. DataFrame implements List<Row>. If the DataFrame has a key column it can be viewed as a NavigableMap<T, Row>
Basic filtering, joining, grouping
No additional dependencies
Extensive testing

Sample Use Cases

Great as a ResultSet cache. Have an expensive query that needs to run every time your app starts and it's slowing down your development? Cache it locally on disk in a DataFrame! Because DataFrame can be viewed as a ResultSet, you can plug it into existing code with minimal changes. Or cache it in a service and pull it over the network (reads/writes directly to Channel).
Great for generating Excel reports via POI. Stage the data in a DataFrame first, then write POI code against the DataFrame. This separates concerns and is easier than writing POI directly against a ResultSet.

Related Skills

node-connect

351.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.9k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

351.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

351.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

biteytech

View profile

View on GitHub

GitHub Stars25

CategoryDevelopment

Updated1y ago

Forks1

biteytech/dataframe

Languages

Java

Security Score

65/100

Audited on Jan 8, 2025

No findings

Dataframe

Install / Use

README

dataframe

Sample Usages

Release Notes

What's different about this dataframe library?

Features

Sample Use Cases

Related Skills