SkillAgentSearch skills...

Dataframe

Another dataframe library for Java, inspired by Tablesaw, built on nio buffers

Install / Use

/learn @biteytech/Dataframe
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

dataframe

Apache 2.0

Another dataframe library for Java, inspired by Tablesaw, built on nio buffers.

To add a dependency on dataframe using Maven, use the following:

<dependency>
  <groupId>tech.bitey</groupId>
  <artifactId>dataframe</artifactId>
  <version>1.2.11</version>
</dependency>

Requires Java 17 or higher. The last version supporting Java 11 was 1.1.7.

Sample Usages

Release Notes

What's different about this dataframe library?

  • It's geared towards making it easier to ship around tabular data for Java backend developers - rather than for data science. This is not Pandas for Java.
  • Data is stored in ByteBuffers, so the dataframes can read/write to Channels with minimal overhead (save to files, send over network).
  • Optimized for space. For example, booleans take one bit each, DateTimes take one long (with microsecond precision).
  • Nulls are stored in a separate bitset (also backed by ByteBuffer), taking up two bits per Column length. No extra space is used if all values are non-null.

Features

  • Supports the most common types: String, int, long, short, byte, boolean, double, float, Date, DateTime, and BigDecimal; as well as Time, UUID, Instant, and InputStream.
  • Column and DataFrame are immutable. Columns can be created from collections, arrays, streams, or with builders.
  • Read/write to File or Channel with minimal overhead. Supports memory-mapping from a file.
  • Read/write CSV files
  • Read from ResultSet, write with PreparedStatement
  • Backing ByteBuffers can be on heap or off as a global property: -Dtech.bitey.allocateDirect=true or false, defaults to false
  • Column implements List. DataFrame implements List<Row>. If the DataFrame has a key column it can be viewed as a NavigableMap<T, Row>
  • Basic filtering, joining, grouping
  • No additional dependencies
  • Extensive testing

Sample Use Cases

  • Great as a ResultSet cache. Have an expensive query that needs to run every time your app starts and it's slowing down your development? Cache it locally on disk in a DataFrame! Because DataFrame can be viewed as a ResultSet, you can plug it into existing code with minimal changes. Or cache it in a service and pull it over the network (reads/writes directly to Channel).
  • Great for generating Excel reports via POI. Stage the data in a DataFrame first, then write POI code against the DataFrame. This separates concerns and is easier than writing POI directly against a ResultSet.

Related Skills

View on GitHub
GitHub Stars25
CategoryDevelopment
Updated1y ago
Forks1

Languages

Java

Security Score

65/100

Audited on Jan 8, 2025

No findings