SkillAgentSearch skills...

Centurion

Kotlin Bigdata Toolkit

Install / Use

/learn @sksamuel/Centurion
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Centurion <img src="logo.png" height="50">

master <img src="https://img.shields.io/maven-central/v/com.sksamuel.centurion/centurion-avro.svg?label=latest%20release"/> <img src="https://img.shields.io/maven-metadata/v?metadataUrl=https%3A%2F%2Fcentral.sonatype.com%2Frepository%2Fmaven-snapshots%2Fcom%2Fsksamuel%2Fcenturion%2Fcenturion-avro%2Fmaven-metadata.xml&strategy=highestVersion&label=maven-snapshot"> License

Centurion is a high-performance Kotlin toolkit for working with columnar and streaming data formats in a type-safe, idiomatic way. Built on top of proven Apache libraries, it provides zero-copy serialization, automatic code generation, and seamless integration with modern JVM applications.

Why Centurion?

  • Type-safe by design: Leverage Kotlin's type system with compile-time guarantees and automatic null safety
  • Zero-copy performance: Optimized encoders/decoders with reflection caching and pooled resources
  • Schema evolution made easy: First-class support for forward/backward compatible schema changes
  • Batteries included: Support for 40+ types out of the box including temporal types, BigDecimal, collections
  • Production ready: Built on Apache Avro and Parquet - battle-tested formats used at scale

See changelog for release notes.

Features

  • Type-safe schema definitions: Define schemas using Kotlin's type system with compile-time safety
  • Multiple format support: Seamlessly work with Avro and Parquet formats
  • High-performance Serde API: Zero-copy serialization with automatic compression support
  • Schema evolution: Forward and backward compatible schema changes for Avro
  • Code generation: Generate data classes and optimized encoders/decoders from Avro schemas
  • Redis integration: Built-in Lettuce codecs for caching Avro data
  • Streaming operations: Efficient streaming readers and writers for large datasets
  • Kotlin-first design: Idiomatic APIs with null safety, data classes, and extension functions

Getting Started

Add Centurion to your build depending on which formats you need:

// For Avro support
implementation("com.sksamuel.centurion:centurion-avro:<version>")

// For Parquet support
implementation("com.sksamuel.centurion:centurion-parquet:<version>")

Quick Start

Here's a complete example to get you started:

import com.sksamuel.centurion.avro.io.serde.BinarySerde
import java.math.BigDecimal

// Define your domain model
data class Product(
    val id: Long,
    val name: String,
    val price: BigDecimal,
    val inStock: Boolean,
    val tags: List<String>
)

// Create a serde (serializer/deserializer)
val serde = BinarySerde<Product>()

// Your data
val product = Product(
    id = 12345L,
    name = "Kotlin in Action",
    price = BigDecimal("39.99"),
    inStock = true,
    tags = listOf("books", "programming", "kotlin")
)

// Serialize to bytes
val bytes = serde.serialize(product)

// Deserialize back to object
val restored = serde.deserialize(bytes)
println(restored) // Product(id=12345, name=Kotlin in Action, ...)

Avro Operations

Writing Avro Data

import com.sksamuel.centurion.Schema
import com.sksamuel.centurion.Struct
import com.sksamuel.centurion.avro.io.BinaryWriter
import com.sksamuel.centurion.avro.encoders.ReflectionRecordEncoder
import com.sksamuel.centurion.avro.schemas.toAvroSchema
import org.apache.avro.io.EncoderFactory
import java.io.FileOutputStream

// Define your schema
val schema = Schema.Struct(
  Schema.Field("id", Schema.Int64),
  Schema.Field("name", Schema.Strings),
  Schema.Field("timestamp", Schema.TimestampMillis)
)

// Create some data
val records = listOf(
  Struct(schema, 1L, "Alice", System.currentTimeMillis()),
  Struct(schema, 2L, "Bob", System.currentTimeMillis()),
  Struct(schema, 3L, "Charlie", System.currentTimeMillis())
)

// Write to Avro binary format
FileOutputStream("users.avro").use { output ->
  val avroSchema = schema.toAvroSchema()
  val writer = BinaryWriter(
    schema = avroSchema,
    out = output,
    ef = EncoderFactory.get(),
    encoder = ReflectionRecordEncoder(avroSchema, Struct::class),
    reuse = null
  )
  records.forEach { writer.write(it) }
  writer.close()
}

Reading Avro Data

import com.sksamuel.centurion.avro.io.BinaryReader
import com.sksamuel.centurion.avro.decoders.ReflectionRecordDecoder
import org.apache.avro.io.DecoderFactory
import java.io.FileInputStream

// Read from Avro binary format  
FileInputStream("users.avro").use { input ->
  val avroSchema = schema.toAvroSchema()
  val reader = BinaryReader(
    schema = avroSchema,
    input = input,
    factory = DecoderFactory.get(),
    decoder = ReflectionRecordDecoder(avroSchema, Struct::class),
    reuse = null
  )
  // BinaryReader reads one record per file
  val struct = reader.read()
  println("User: ${struct["name"]}, ID: ${struct["id"]}")
}

Parquet Operations

Writing Parquet Data

import com.sksamuel.centurion.parquet.Parquet
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path

// Define schema and data
val schema = Schema.Struct(
  Schema.Field("product_id", Schema.Strings),
  Schema.Field("quantity", Schema.Int32),
  Schema.Field("price", Schema.Decimal(Schema.Precision(10), Schema.Scale(2)))
)

val data = listOf(
  Struct(schema, "PROD-001", 10, java.math.BigDecimal("29.99")),
  Struct(schema, "PROD-002", 5, java.math.BigDecimal("15.50")),
  Struct(schema, "PROD-003", 20, java.math.BigDecimal("8.75"))
)

// Write to Parquet
val path = Path("sales.parquet")
val conf = Configuration()
val writer = Parquet.writer(path, schema, conf)

data.forEach { struct ->
  writer.write(struct)
}
writer.close()

Reading Parquet Data

import com.sksamuel.centurion.parquet.Parquet

// Read from Parquet
val path = Path("sales.parquet")
val conf = Configuration()
val reader = Parquet.reader(path, conf)

var struct = reader.read()
while (struct != null) {
  println("Product: ${struct["product_id"]}, Qty: ${struct["quantity"]}")
  struct = reader.read()
}
reader.close()

// Count records efficiently
val recordCount = Parquet.count(listOf(path), conf)
println("Total records: $recordCount")

Schema Conversion

Convert between different format schemas:

import com.sksamuel.centurion.avro.schemas.toAvroSchema
import com.sksamuel.centurion.parquet.schemas.ToParquetSchema

// Convert Centurion schema to Avro schema
val centurionSchema = Schema.Struct(
  Schema.Field("name", Schema.Strings),
  Schema.Field("age", Schema.Int32)
)

val avroSchema = centurionSchema.toAvroSchema()

// Convert to Parquet schema
val parquetSchema = ToParquetSchema.toParquetType(centurionSchema)

Advanced Types

Working with Complex Types

// Array/List schema
val numbersSchema = Schema.Array(Schema.Int32)

// Map schema
val metadataSchema = Schema.Map(Schema.Strings) // String keys, String values

// Nested struct
val addressSchema = Schema.Struct(
  Schema.Field("street", Schema.Strings),
  Schema.Field("city", Schema.Strings),
  Schema.Field("zipcode", Schema.Strings)
)

val personSchema = Schema.Struct(
  Schema.Field("name", Schema.Strings),
  Schema.Field("address", addressSchema),
  Schema.Field("phone_numbers", Schema.Array(Schema.Strings))
)

Temporal Types

// Timestamp types
val eventSchema = Schema.Struct(
  Schema.Field("event_name", Schema.Strings),
  Schema.Field("timestamp_millis", Schema.TimestampMillis),
  Schema.Field("timestamp_micros", Schema.TimestampMicros)
)

// Create struct with temporal data
val event = Struct(
  eventSchema,
  "user_login",
  System.currentTimeMillis(),
  System.currentTimeMillis() * 1000
)

Decimal Precision

// High-precision decimal for financial data
val transactionSchema = Schema.Struct(
  Schema.Field("transaction_id", Schema.Strings),
  Schema.Field("amount", Schema.Decimal(
    Schema.Precision(18), // 18 total digits
    Schema.Scale(4)       // 4 decimal places
  ))
)

val transaction = Struct(
  transactionSchema,
  "TXN-123456",
  java.math.BigDecimal("1234.5678")
)

Supported Types

Centurion provides built-in encoders and decoders for a comprehensive set of types:

Avro Type Support

| Type | Encoder/Decoder | Notes | |------|-----------------|-------| | Primitives | | | | Byte, Short, Int, Long | ✓ | Direct mapping to Avro types | | Float, Double | ✓ | IEEE 754 floating point | | Boolean | ✓ | | | String | ✓ | UTF-8 encoded, optimized with globalUseJavaString | | Temporal Types | | | | Instant | ✓ | TimestampMillis/TimestampMicros logical types | | LocalDateTime | ✓ | LocalTimestampMillis/LocalTimestampMicros | | LocalTime | ✓ | TimeMillis/TimeMicros logical types | | OffsetDateTime | ✓ | Converted to Instant | | Numeric Types | | | | BigDecimal | ✓ | Bytes/Fixed/String encodings with scale | | UUID | ✓ | String or fixed byte encoding | | Collections | | | | List<T>, Set<T> | ✓ | Generic support for any element type | | Array<T> | ✓ | Native array support | | LongArray, IntArray | ✓ | Optimized primitive arrays | | Map<String, T> | ✓ | String keys required by Avro | | Binary | | | | ByteArray | ✓ | Direct bytes type | | ByteBuffer | ✓ | Zero-copy when possible | | Enums | ✓ | Kotlin enum classes | | Nullable Types | ✓ | Full Kotlin null-safety support | | Data Classes | ✓ | Via reflection or code generation |

High-Performance Serde API

The Serde (Serializer/Deserializer) API provides a convenient way to convert between Kotlin objects

Related Skills

View on GitHub
GitHub Stars336
CategoryDevelopment
Updated1mo ago
Forks45

Languages

Kotlin

Security Score

100/100

Audited on Mar 1, 2026

No findings