Poppy
A dataframe library for java
Install / Use
/learn @tenmax/PoppyREADME
Poppy
poppy is dataframe library for java, which provides common SQL operations (e.g. select, from, where, group by, order by, distinct) to process data in java.
Unlike other dataframe libraries, which keep all the data in memory, poppy process data in streaming manager. That is, it is more similar as Java8 Stream library, but relational version.
Here is a simple example. We have a Student class
public class Student {
private int studentId;
private String name;
private int grade;
private int room;
private int height;
private int weight;
...
}
In SQL, we have a query like this
select
grade,
room,
avg(weight) as weight,
avg(height) as height
from Student
group by grade, room
order by grade, room
Here is the Poppy's version
List<Student> students = ...;
DataFrame
.from(students, Student.class)
.groupby("grade", "room")
.aggregate(
avgLong("weight").as("weight"),
avgLong("height").as("height"))
.sort("grade", "room")
.print();
Getting Started
Requirement
Java 8 or higher
Dependency
Poppy's package is managed by JCenter repository.
Maven
<dependency>
<groupId>io.tenmax</groupId>
<artifactId>poppy</artifactId>
<version>0.1.8</version>
<type>pom</type>
</dependency>
Gradle
compile 'io.tenmax:poppy:0.1.8'
Features
- Support the most common operations in SQL. e.g. select, from, where, group by, order by, distinct
- Support the most common aggregation functions in SQL. e.g. avg(), sum(), count(), min(), max()
- Custom aggregation functions. by java.util.stream.Collector
- Partition support. Partition is the unit of parallelism. Multiple partitions allow you processing data concurrently.
- Multi-threaded support. For CPU-bound jobs, it leverages all your CPU resources for better performance; for IO-bound jobs, it reduces the waiting time, and take adventages of better concurrency.
- Suitable for both batch and streaming scenario.
- Lightweight. Comparing to Spark DataFrame API, it is much more lightweight to embed in your application.
- Stream-based design. Comparing to joinery, which keeps the whole data in memory. Poppy's streaming behaviour allows limited memory to process huge volume of data.
Documentation
Contribution
Please fork this project and pull request to me and any comment would be appreciated!
Related Skills
node-connect
351.8kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
110.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
351.8kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
351.8kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
