Centurion
Kotlin Bigdata Toolkit
Install / Use
/learn @sksamuel/CenturionREADME
Centurion <img src="logo.png" height="50">
<img src="https://img.shields.io/maven-central/v/com.sksamuel.centurion/centurion-avro.svg?label=latest%20release"/>
<img src="https://img.shields.io/maven-metadata/v?metadataUrl=https%3A%2F%2Fcentral.sonatype.com%2Frepository%2Fmaven-snapshots%2Fcom%2Fsksamuel%2Fcenturion%2Fcenturion-avro%2Fmaven-metadata.xml&strategy=highestVersion&label=maven-snapshot">
Centurion is a high-performance Kotlin toolkit for working with columnar and streaming data formats in a type-safe, idiomatic way. Built on top of proven Apache libraries, it provides zero-copy serialization, automatic code generation, and seamless integration with modern JVM applications.
Why Centurion?
- Type-safe by design: Leverage Kotlin's type system with compile-time guarantees and automatic null safety
- Zero-copy performance: Optimized encoders/decoders with reflection caching and pooled resources
- Schema evolution made easy: First-class support for forward/backward compatible schema changes
- Batteries included: Support for 40+ types out of the box including temporal types, BigDecimal, collections
- Production ready: Built on Apache Avro and Parquet - battle-tested formats used at scale
See changelog for release notes.
Features
- Type-safe schema definitions: Define schemas using Kotlin's type system with compile-time safety
- Multiple format support: Seamlessly work with Avro and Parquet formats
- High-performance Serde API: Zero-copy serialization with automatic compression support
- Schema evolution: Forward and backward compatible schema changes for Avro
- Code generation: Generate data classes and optimized encoders/decoders from Avro schemas
- Redis integration: Built-in Lettuce codecs for caching Avro data
- Streaming operations: Efficient streaming readers and writers for large datasets
- Kotlin-first design: Idiomatic APIs with null safety, data classes, and extension functions
Getting Started
Add Centurion to your build depending on which formats you need:
// For Avro support
implementation("com.sksamuel.centurion:centurion-avro:<version>")
// For Parquet support
implementation("com.sksamuel.centurion:centurion-parquet:<version>")
Quick Start
Here's a complete example to get you started:
import com.sksamuel.centurion.avro.io.serde.BinarySerde
import java.math.BigDecimal
// Define your domain model
data class Product(
val id: Long,
val name: String,
val price: BigDecimal,
val inStock: Boolean,
val tags: List<String>
)
// Create a serde (serializer/deserializer)
val serde = BinarySerde<Product>()
// Your data
val product = Product(
id = 12345L,
name = "Kotlin in Action",
price = BigDecimal("39.99"),
inStock = true,
tags = listOf("books", "programming", "kotlin")
)
// Serialize to bytes
val bytes = serde.serialize(product)
// Deserialize back to object
val restored = serde.deserialize(bytes)
println(restored) // Product(id=12345, name=Kotlin in Action, ...)
Avro Operations
Writing Avro Data
import com.sksamuel.centurion.Schema
import com.sksamuel.centurion.Struct
import com.sksamuel.centurion.avro.io.BinaryWriter
import com.sksamuel.centurion.avro.encoders.ReflectionRecordEncoder
import com.sksamuel.centurion.avro.schemas.toAvroSchema
import org.apache.avro.io.EncoderFactory
import java.io.FileOutputStream
// Define your schema
val schema = Schema.Struct(
Schema.Field("id", Schema.Int64),
Schema.Field("name", Schema.Strings),
Schema.Field("timestamp", Schema.TimestampMillis)
)
// Create some data
val records = listOf(
Struct(schema, 1L, "Alice", System.currentTimeMillis()),
Struct(schema, 2L, "Bob", System.currentTimeMillis()),
Struct(schema, 3L, "Charlie", System.currentTimeMillis())
)
// Write to Avro binary format
FileOutputStream("users.avro").use { output ->
val avroSchema = schema.toAvroSchema()
val writer = BinaryWriter(
schema = avroSchema,
out = output,
ef = EncoderFactory.get(),
encoder = ReflectionRecordEncoder(avroSchema, Struct::class),
reuse = null
)
records.forEach { writer.write(it) }
writer.close()
}
Reading Avro Data
import com.sksamuel.centurion.avro.io.BinaryReader
import com.sksamuel.centurion.avro.decoders.ReflectionRecordDecoder
import org.apache.avro.io.DecoderFactory
import java.io.FileInputStream
// Read from Avro binary format
FileInputStream("users.avro").use { input ->
val avroSchema = schema.toAvroSchema()
val reader = BinaryReader(
schema = avroSchema,
input = input,
factory = DecoderFactory.get(),
decoder = ReflectionRecordDecoder(avroSchema, Struct::class),
reuse = null
)
// BinaryReader reads one record per file
val struct = reader.read()
println("User: ${struct["name"]}, ID: ${struct["id"]}")
}
Parquet Operations
Writing Parquet Data
import com.sksamuel.centurion.parquet.Parquet
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
// Define schema and data
val schema = Schema.Struct(
Schema.Field("product_id", Schema.Strings),
Schema.Field("quantity", Schema.Int32),
Schema.Field("price", Schema.Decimal(Schema.Precision(10), Schema.Scale(2)))
)
val data = listOf(
Struct(schema, "PROD-001", 10, java.math.BigDecimal("29.99")),
Struct(schema, "PROD-002", 5, java.math.BigDecimal("15.50")),
Struct(schema, "PROD-003", 20, java.math.BigDecimal("8.75"))
)
// Write to Parquet
val path = Path("sales.parquet")
val conf = Configuration()
val writer = Parquet.writer(path, schema, conf)
data.forEach { struct ->
writer.write(struct)
}
writer.close()
Reading Parquet Data
import com.sksamuel.centurion.parquet.Parquet
// Read from Parquet
val path = Path("sales.parquet")
val conf = Configuration()
val reader = Parquet.reader(path, conf)
var struct = reader.read()
while (struct != null) {
println("Product: ${struct["product_id"]}, Qty: ${struct["quantity"]}")
struct = reader.read()
}
reader.close()
// Count records efficiently
val recordCount = Parquet.count(listOf(path), conf)
println("Total records: $recordCount")
Schema Conversion
Convert between different format schemas:
import com.sksamuel.centurion.avro.schemas.toAvroSchema
import com.sksamuel.centurion.parquet.schemas.ToParquetSchema
// Convert Centurion schema to Avro schema
val centurionSchema = Schema.Struct(
Schema.Field("name", Schema.Strings),
Schema.Field("age", Schema.Int32)
)
val avroSchema = centurionSchema.toAvroSchema()
// Convert to Parquet schema
val parquetSchema = ToParquetSchema.toParquetType(centurionSchema)
Advanced Types
Working with Complex Types
// Array/List schema
val numbersSchema = Schema.Array(Schema.Int32)
// Map schema
val metadataSchema = Schema.Map(Schema.Strings) // String keys, String values
// Nested struct
val addressSchema = Schema.Struct(
Schema.Field("street", Schema.Strings),
Schema.Field("city", Schema.Strings),
Schema.Field("zipcode", Schema.Strings)
)
val personSchema = Schema.Struct(
Schema.Field("name", Schema.Strings),
Schema.Field("address", addressSchema),
Schema.Field("phone_numbers", Schema.Array(Schema.Strings))
)
Temporal Types
// Timestamp types
val eventSchema = Schema.Struct(
Schema.Field("event_name", Schema.Strings),
Schema.Field("timestamp_millis", Schema.TimestampMillis),
Schema.Field("timestamp_micros", Schema.TimestampMicros)
)
// Create struct with temporal data
val event = Struct(
eventSchema,
"user_login",
System.currentTimeMillis(),
System.currentTimeMillis() * 1000
)
Decimal Precision
// High-precision decimal for financial data
val transactionSchema = Schema.Struct(
Schema.Field("transaction_id", Schema.Strings),
Schema.Field("amount", Schema.Decimal(
Schema.Precision(18), // 18 total digits
Schema.Scale(4) // 4 decimal places
))
)
val transaction = Struct(
transactionSchema,
"TXN-123456",
java.math.BigDecimal("1234.5678")
)
Supported Types
Centurion provides built-in encoders and decoders for a comprehensive set of types:
Avro Type Support
| Type | Encoder/Decoder | Notes |
|------|-----------------|-------|
| Primitives | | |
| Byte, Short, Int, Long | ✓ | Direct mapping to Avro types |
| Float, Double | ✓ | IEEE 754 floating point |
| Boolean | ✓ | |
| String | ✓ | UTF-8 encoded, optimized with globalUseJavaString |
| Temporal Types | | |
| Instant | ✓ | TimestampMillis/TimestampMicros logical types |
| LocalDateTime | ✓ | LocalTimestampMillis/LocalTimestampMicros |
| LocalTime | ✓ | TimeMillis/TimeMicros logical types |
| OffsetDateTime | ✓ | Converted to Instant |
| Numeric Types | | |
| BigDecimal | ✓ | Bytes/Fixed/String encodings with scale |
| UUID | ✓ | String or fixed byte encoding |
| Collections | | |
| List<T>, Set<T> | ✓ | Generic support for any element type |
| Array<T> | ✓ | Native array support |
| LongArray, IntArray | ✓ | Optimized primitive arrays |
| Map<String, T> | ✓ | String keys required by Avro |
| Binary | | |
| ByteArray | ✓ | Direct bytes type |
| ByteBuffer | ✓ | Zero-copy when possible |
| Enums | ✓ | Kotlin enum classes |
| Nullable Types | ✓ | Full Kotlin null-safety support |
| Data Classes | ✓ | Via reflection or code generation |
High-Performance Serde API
The Serde (Serializer/Deserializer) API provides a convenient way to convert between Kotlin objects
Related Skills
node-connect
350.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
109.9kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
350.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
350.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
