Darwin

Avro Schema Evolution made easy

Generate Convert Improve

Install / Use

/learn @agile-lab-dev/Darwin

About this skill

Quality Score

0/100

README

Darwin

General
Installation
Usage
Configuration
- General
- HBase
- PostgreSql
- REST
- Confluent

Overview

Darwin is a repository of Avro schemas that maintains all the schema versions used during your application lifetime. Its main goal is to provide an easy and transparent access to the Avro data in your storage independently from schemas evolutions. Darwin is portable and it doesn't require any application server. To store its data, you can choose from multiple storage managers (HBase, Postgres) easily pluggable importing the desired connector.

Artifacts

Darwin artifacts are published for scala 2.10, 2.11, 2.12 and 2.13 (from version 1.0.12). From version 1.0.2 Darwin is available from maven central so there is no need to configure additional repositories in your project.

In order to access to Darwin core functionalities add the core dependency to you project:

core

sbt

libraryDependencies += "it.agilelab" %% "darwin-core" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-core_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

HBase connector

Then add the connector of your choice, either HBase:

sbt

libraryDependencies += "it.agilelab" %% "darwin-hbase-connector" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-hbase-connector_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

Postgresql connector

Or PostgreSql:

sbt

libraryDependencies += "it.agilelab" %% "darwin-postgres-connector" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-postgres-connector_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

Rest Connector

Or Rest

sbt

libraryDependencies += "it.agilelab" %% "darwin-rest-connector" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-rest-connector_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

Rest server

To use the rest connector implement the required endpoints or use the reference implementation provided by rest-server module

Mock connector

Or Mock (only for test scenarios):

sbt

libraryDependencies += "it.agilelab" %% "darwin-mock-connector" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-mock-connector_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

Confluent schema registry Connector

Darwin can be used as a facade over confluent schema registry.

sbt

libraryDependencies += "it.agilelab" %% "darwin-confluent-connector" % "1.2.1-SNAPSHOT"

maven

<dependency>
  <groupId>it.agilelab</groupId>
  <artifactId>darwin-confluent-connector_2.11</artifactId>
  <version>1.2.1-SNAPSHOT</version>
</dependency>

Background

In systems where objects encoded using Avro are stored, a problem arises when there is an evolution of the structure of those objects. In these cases, Avro is not capable of reading the old data using the schema extracted from the actual version of the object: in this scenario each avro-encoded object must be stored along with its schema. To address this problem Avro defined the Single-Object Encoding specification:

Single-object encoding

In some situations a single Avro serialized object is to be stored for a longer period of time. In the period after a schema change this persistance system will contain records that have been written with different schemas. So the need arises to know which schema was used to write a record to support schema evolution correctly. In most cases the schema itself is too large to include in the message, so this binary wrapper format supports the use case more effectively.

Darwin is compliant to this specification and provides utility methods that can generate a Single-Object encoded from an Avro byte array and extract an Avro byte array (along with its schema) from a Single-Object encoded one.

Architecture

Darwin architecture schema

Darwin maintains a repository of all the known schemas in the configured storage, and can access these data in three configurable ways:

Eager Cached

Darwin loads all schemas once from the selected storage and fills with them an internal cache that is used for all the subsequent queries. The only other access to the storage is due to the invocation of the registerAll method which updates both the cache and the storage with the new schemas. Once the cache is loaded, all the getId and getSchema method invocations will perform lookups only in the cache.
Lazy Cached

Darwin behaves like the Eager Cached scenario, but each cache miss is then attempted also into the storage. If the data is found on the storage, the cache is then updated with the fetched data.
Lazy

Darwin performs all lookups directly on the storage: there is no applicative cache.

Darwin interaction

Darwin can be used to easily read and write data encoded in Avro Single-Object using the generateAvroSingleObjectEncoded and retrieveSchemaAndAvroPayload methods of a AvroSchemaManager instance (they rely on the getId and getSchema methods discussed before). These methods allow your application to convert and encoded avro byte array into a single-object encoded one, and to extract the schema and payload from a single-object encoded record that was written. If there is the need to use single-object encoding utilities without creating an AvroSchemaManager instance, the utilities object AvroSingleObjectEncodingUtils exposes some generic purpose functionality, such as:

check if a byte array is single-object encoded
create a single-object encoded byte array from payload and schema ID
extract the schema ID from a single-object encoded byte array
remove the header (schema ID included) of a single-object encoded byte array

Darwin interaction

JVM compatibility

Darwin is cross-published among different scala versions (2.10, 2.11, 2.12, 2.13). Depending on the Scala version, it targets different JVM versions.

Please refer to the following compatibility matrix:

| Scala version | JVM version | |---------------|-------------| | 2.10 | 1.7 | | 2.11 | 1.7 | | 2.12 | 1.8 | | 2.13 | 1.8 |

Installation

To use Darwin in your application, simply add it as dependency along with one of the available connectors. Darwin can automatically load the defined connector, and it can be used directly to register and to retrieve Avro schemas.

Usage

Darwin main functionality are exposed by the AvroSchemaManager, which can be used to store and retrieve the known avro schemas. To get an instance of AvroSchemaManager there are two main ways:

You can create an instance of AvroSchemaManager directly, passing a Connector as constructor argument; the available implementations of AvroSchemaManager are the ones introduced in te chapter Architecture: CachedEagerAvroSchemaManager, CachedLazyAvroSchemaManager and LazyAvroSchemaManager.
You can obtain an instance of AvroSchemaManager using the AvroSchemaManagerFactory: for each configuration passed as input of the initialize method, a new instance is created. The instance can be retrieved later using the getInstance method.

To get more insight on how the Typesafe configuration must be defined to create an AvroSchemaManager instance (or directly a Connector instance), please check how the configuration file should be created in the Configuration section of the storage you chose.

Once you created an instance of AvroSchemaManager, first of all an application should register all its known Avro schemas invoking the registerAll method:

  val manager: AvroSchemaManager = AvroSchemaManagerFactory.initialize(config)
  val schemas: Seq[Schema] = //obtain all the schemas
  val registered: Seq[(Long, Schema)] = manager.registerAll(schemas)

To generate the Avro schema for your classes there are various ways, if you are using standard Java pojos:

  val schema: Schema = ReflectData.get().getSchema(classOf[MyClass])

If your application uses the avro4s library you can instead obtain the schemas through the AvroSchema typeclass implicitly generated by avro4s, e.g.:

  val schema: Schema = new AvroSchema[MyClass]

Once you have registered all the schemas used by your application, you can use them directly invoking the AvroSchemaManager object: it exposes functionality to retrieve the schema from an ID and vice-versa.

  val id: Long = manager.getId(schema)
  val schema: Schema = manager.getSchema(id)

As said previously,

Related Skills

node-connect

348.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

348.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

348.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。

agile-lab-dev

View profile

View on GitHub

GitHub Stars36

CategoryDevelopment

Updated1mo ago

Forks11

agile-lab-dev/darwin

Languages

Scala

Security Score

95/100

Audited on Feb 6, 2026

No findings

Darwin

Install / Use

README

Darwin

Table of contents

Overview

Artifacts

core

sbt

maven

HBase connector

sbt

maven

Postgresql connector

sbt

maven

Rest Connector

sbt

maven

Rest server

Mock connector

sbt

maven

Confluent schema registry Connector

sbt

maven

Background

Single-object encoding

Architecture

Darwin architecture schema

Eager Cached

Lazy Cached

Lazy

Darwin interaction

JVM compatibility

Installation

Usage

Related Skills