Lancaster
Apache Avro library for Clojure and ClojureScript
Install / Use
/learn @deercreeklabs/LancasterREADME
Lancaster
- Installation
- About
- Examples
- Creating Schema Objects
- Data Types
- Names and Namespaces
- API Documentation
- Developing
- License
Installation
Using Leiningen / Clojars:
About
Lancaster is an Apache Avro library for Clojure and ClojureScript. It aims to be fully compliant with the Avro Specification. It is assumed that the reader of this documentation is familiar with Avro and Avro terminology. If this is your first exposure to Avro, please read the Avro Overview and the Avro Specification before proceeding.
Lancaster provides for:
- Easy schema creation
- Serialization of arbitrarily-complex data structures to a byte array
- Deserialization from a byte array, including schema resolution
Lancaster does not support:
- Avro protocols
- Avro logical types
- Avro container files (may be supported in the future).
Project Name
The Avro Lancaster was an airplane manufactured by Avro Aircraft.
Examples
Here is an introductory example of using Lancaster to define a schema, serialize data, and then deserialize it.
(require '[deercreeklabs.lancaster :as l])
(l/def-record-schema person-schema
[:name l/string-schema]
[:age l/int-schema])
(def alice
{:name "Alice"
:age 40})
(def encoded (l/serialize person-schema alice))
(l/deserialize person-schema person-schema encoded)
;; {:name "Alice" :age 40}
Here is a more complex example using nested schemas:
(require '[deercreeklabs.lancaster :as l])
(l/def-enum-schema hand-schema
:left :right)
(l/def-record-schema person-schema
[:name l/string-schema]
[:age l/int-schema]
[:dominant-hand hand-schema]
[:favorite-integers (l/array-schema l/int-schema)]
[:favorite-color l/string-schema])
(def alice
{:name "Alice"
:age 40
:favorite-integers [12 59]
:dominant-hand :left})
;; :favorite-color is missing. Record fields are optional by default.
(def encoded (l/serialize person-schema alice))
(l/deserialize person-schema person-schema encoded)
;; {:name "Alice", :age 40, :dominant-hand :left, :favorite-integers [12 59], :favorite-color nil}
Creating Schema Objects
Lancaster schema objects are required for serialization and deserialization. These can be created in two ways:
- Using an existing Avro schema in JSON format. To do this, use the json->schema function. This is best if you are working with externally defined schemas from another system or language.
- Using Lancaster schema functions and/or macros. This is best if you want to define Avro schemas using Clojure/ClojureScript. Lancaster lets you concisely create and combine schemas in arbitrarily complex ways, as explained below.
Primitive Schemas
Lancaster provides predefined schema objects for all the
Avro primitives.
The following vars are defined in the deercreeklabs.lancaster namespace:
null-schema: Represents an Avronullboolean-schema: Represents an Avrobooleanint-schema: Represents an Avrointlong-schema: Represents an Avrolongfloat-schema: Represents an Avrofloatdouble-schema: Represents an Avrodoublebytes-schema: Represents an Avrobytesstring-schema: Represents an Avrostringstring-set-schema: Represents an Avromapschema w/nullvalues; treated as a Clojure(Script)set.
These schema objects can be used directly or combined into complex schemas.
Complex Schemas
Most non-trivial Lancaster use cases will involve complex Avro schemas. The easiest and most concise way to create complex schemas is by using the Schema Creation Macros. For situations where macros do not work well, the Schema Creation Functions are also available.
Schema Creation Macros
- def-array-schema: Defines a var w/ an array schema.
- def-enum-schema: Defines a var w/ an enum schema.
- def-fixed-schema: Defines a var w/ a fixed schema.
- def-map-schema: Defines a var w/ a map schema. Keys must be strings.
- def-record-schema: Defines a var w/ a record schema.
- def-union-schema: Defines a var w/ a union schema.
- def-maybe-schema: Defines a var w/ a nillable schema.
Schema Creation Functions
- array-schema: Creates an array schema.
- enum-schema: Creates an enum schema.
- fixed-schema: Creates a fixed schema.
- map-schema: Creates a map schema. Keys must be strings.
- record-schema: Creates a record schema.
- union-schema: Creates a union schema.
- maybe: Creates a nillable schema.
Operations on Schema Objects
All of these functions take a Lancaster schema object as the first argument:
- serialize: Serializes data to a byte array.
- deserialize: Deserializes data from a byte array, using separate reader and writer schemas. This is the recommended deserialization function.
- deserialize-same: Deserializes data from a byte array, using the same reader and writer schema. This is not recommended, as it does not allow for schema resolution / evolution.
- edn: Returns the EDN representation of the schema.
- json: Returns the JSON representation of the schema.
- pcf: Returns a JSON string containing the Parsing Canonical Form of the schema.
- fingerprint64: Returns the 64-bit Rabin fingerprint of the Parsing Canonical Form of the schema.
- fingerprint128: Returns the 128-bit MD5 Digest of the Parsing Canonical Form of the schema.
- fingerprint256: Returns the 256-bit SHA-256 Hash of the [ Parsing Canonical Form of the schema.
- schema?: Is the argument a Lancaster schema?
- default-data: Returns default data that conforms to the schema.
Data Types
Serialization
When serializing data, Lancaster accepts the following Clojure(Script) types for the given Avro type:
Avro Type | Acceptable Clojure / ClojureScript Types
--------- | -------------------------
null | nil
boolean | boolean
int | int, java.lang.Integer, long (if in integer range), java.lang.Long (if in integer range), js/Number (if in integer range)
long | long, java.lang.Long
float | float, java.lang.Float, double (if in float range), java.lang.Double (if in float range), js/Number (if in float range)
double | double, java.lang.Double, js/Number
bytes | byte-array, java.lang.String, js/Int8Array, js/String
string | byte-array, java.lang.String, js/Int8Array, js/String
fixed | byte-array, js/Int8Array. Byte array length must equal the size declared in the creation of the Lancaster fixed schema.
enum | Simple (non-namespaced) keyword
array | Any data that passes (sequential? data)
map | Any data that passes (map? data), if all keys are strings. Clojure(Script) records DO NOT qualify, since their keys are keywords, not strings.
map (w/ null values schema) | If the values in the map schema is null, the schema is interpreted to represent a Clojure(Script) set, and the data must be a set of strings. Only strings can be elements of this set.
record | Any data that passes (map? data), if all keys are Clojure(Script) simple (non-namespaced) keywords. Clojure(Script) records DO qualify, since their keys are keywords.
union | Any data that matches one of the member schemas declared in the creation of the Lancaster union schema. Note that there are some restrictions on what schemas may be in a union schema, as explained in Notes About Union Data Types below.
Deserialization
When deserializing data, Lancaster returns the following Clojure or ClojureScript types for the given Avro type:
Avro Type | Clojure Type | ClojureScript Type
--------- | ------------ | ------------------
null | nil | nil
boolean | boolean | boolean
int | java.lang.Integer | js/Number
long | java.lang.Long | goog.Long
float | java.lang.Float | js/Number
double | java.lang.Double | js/Number
bytes | byte-array | js/Int8Array
string | java.lang.String | js/String
fixed | byte-array | js/Int8Array
enum | keyword | keyword
array | vector | vector
map | hash-map | hash-map
map (w/ null values schema) | set (w/ string elements) | set (w/ string elements)
record | hash-map | hash-map
union | Data that matches one of the member sche
