Sick
Deduplicated indexed binary storage for JSON
Install / Use
/learn @7mind/SickREADME
SICK: Streams of Independent Constant Keys
SICK is a representation of JSON-like structures.
This repository provides Efficient Binary Aggregate (EBA) - a deduplicated binary storage format for JSON based on the SICK representation. We provide implementations for Scala, C# and JavaScript.
Sister project: UEBA, a tagless binary encoding.
What EBA enables
Current implementation:
- Store JSON-like data in efficient indexed binary form - Access nested data without deserializing the entire structure
- Avoid reading whole JSON files - Access only the data you need with lazy loading
- Deduplicate storage - Store multiple JSON-like structures with automatic deduplication of common values
Future potential:
The SICK representation also enables efficient streaming of JSON data - perfect streaming parsers and efficient delta updates. We currently do not
provide streaming abstractions as it's challenging to design a solution that fits all use cases. Contributions are welcome.
Tradeoffs
Encoding is more complex than traditional JSON serialization, but reading becomes significantly faster and more memory-efficient.
Implementation Status
| Feature | Scala 🟣 | C# 🔵 | JS (ScalaJS) 🟡 | |---------------------------|----------|-------|-----------------| | EBA Encoder 💾 | ✅ | ✅ | ✅ | | EBA Decoder 📥 | ✅ | ✅ | ✅ | | EBA Encoder AST 🌳 | Circe | JSON.Net | JS Objects | | EBA Decoder AST 🌿 | Circe | Custom | JS Objects | | Cursors 🧭 | ⚠️ | ✅ | ❌ | | Path Queries 🔍 | ❌ | ✅ | ❌ | | Stream Encoder 🌊 | ❌ | ❌ | ❌ | | Stream Decoder 🌀 | ❌ | ❌ | ❌ |
Current Scala API for reading SICK structures is less mature than C# one: only basic abstractions are provided. Contributions are welcome.
Limitations
Current implementation constraints:
- Maximum object size: 65,534 keys per object
- Key order: Object key order is not preserved (as per JSON RFC)
- Maximum array elements: 2³² (4,294,967,296) elements
- Maximum unique values per type: 2³² (4,294,967,296) unique values
These limits can be lifted by using more bytes for offsets and counts, though real-world applications rarely approach these limits. Large structures can be split into smaller chunks at the client side.
Project Status
- Battle-tested - Covered by comprehensive test suites including cross-implementation correctness tests (C# ↔ Scala)
- Production-ready - Powers proprietary applications on mobile devices and browsers, including apps with hundreds of thousands of daily active users
- Open source adoption - No known open source users as of October 2025
- Platform support - Additional platform implementations welcome (Python, Rust, Go, etc.)
Performance
SICK excels in scenarios with:
- Large JSON files - Direct indexed reads are much faster than full JSON parse
- Repetitive structure - Deduplication significantly reduces storage
- Memory constraints - Incremental reading uses constant memory
- File size - usually much more compact than JSON
Tradeoffs:
- Write overhead - Encoding is significantly slower than JSON serialization. It can be made faster by partially turning off deduplication.
- Random access - Best for selective field access, not full traversal
A bit of theory and ideas
The Problem with JSON
JSON has a Type-2 grammar and requires a pushdown
automaton to parse it. This makes it impossible to implement an efficient streaming parser for JSON.
Consider a deeply nested hierarchy of JSON objects: you cannot finish parsing the top-level object until you've processed the entire file.
JSON is frequently used to store and transfer large amounts of data, and these transfers tend to grow over
time. A typical JSON config file for a large enterprise product is a good example.
The non-streaming nature of almost all JSON parsers requires substantial work every time you deserialize a large chunk of JSON data:
- Read it from disk
- Parse it in memory into an AST representation
- Map the raw
JSONtree to object instances
Even if you use token streams and know the type of your object ahead of time, you still must deal with the Type-2 grammar.
This can be very inefficient, causing unnecessary delays, pauses, CPU activity spikes, and memory consumption spikes.
The SICK Solution
SICK transforms hierarchical JSON into a flat, deduplicated table of values with references, enabling:
- Indexed access - Jump directly to the data you need
- Deduplication - Share common values across multiple structures
- Streaming capability - Process data in constant memory
- Fast queries - Path-based access without full deserialization
Example Transformation
Given this JSON:
[
{"some key": "some value"},
{"some key": "some value"},
{"some value": "some key"}
]
SICK creates this flattened table:
| Type | Index | Value | Is Root | | ------ | ----- | ------------------------------ | --------------- | | string | 0 | "some key" | No | | string | 1 | "some value" | No | | object | 0 | [string:0, string:1] | No | | object | 1 | [string:1, string:0] | No | | array | 0 | [object:0, object:0, object:1] | Yes (file.json) |
Notice how duplicate values are stored once and referenced multiple times, and how the structure is completely flat.
Streaming
This representation enables many capabilities. For example, we can stream the table:
string:0 = "some key"
string:1 = "some value"
object:0.size = 2
object:0[string:0] = string:1
object:1[string:1] = string:0
array:0.size = 2
array:0[0] = object:0
array:0[1] = object:1
string:2 = "file.json"
root:0=array:0,string:2
While this particular encoding is inefficient, it's streamable. Moreover, we can add removal messages to support arbitrary updates:
array:0[0] = object:1
array:0[1] = remove
Important property: When a stream does not contain removal entries, it can be safely reordered. This eliminates many cases where full accumulation is required.
Depending on the use case, we can process entries as they arrive and discard them immediately. For example, if we need to sum all fields named "amount" across all objects and we have a reference for that name, we can maintain a single accumulator variable and discard everything else as we receive it.
Not all accumulation can be eliminated, though - the receiver may still need to buffer entries until they can be sorted out.
Quick Start
Scala
Add to your build.sbt:
libraryDependencies += "io.7mind.izumi" %% "json-sick" % "<Check for latest version>"
Basic encoding and decoding:
//> using scala "2.13"
//> using dep "io.circe::circe-core:0.14.13"
//> using dep "io.circe::circe-jawn:0.14.13"
//> using dep "io.7mind.izumi::json-sick:latest.integration"
import io.circe._
import io.circe.jawn.parse
import izumi.sick.SICK
import izumi.sick.eba.writer.EBAWriter
import izumi.sick.eba.reader.{EagerEBAReader, IncrementalEBAReader}
import izumi.sick.eba.reader.incremental.IncrementalJValue._
import izumi.sick.model.{SICKWriterParameters, TableWriteStrategy}
import izumi.sick.sickcirce.CirceTraverser._
import java.nio.file.{Files, Paths}
object SickExample {
def main(args: Array[String]): Unit = {
// Parse JSON string
val jsonString = """{"name": "Alice", "age": 30, "city": "NYC"}"""
val json = parse(jsonString).toTry.get
// Encode to SICK binary format
val eba = SICK.packJson(
json = json,
name = "user.json",
dedup = true, // Enable deduplication
dedupPrimitives = true, // Deduplicate primitive values too
avoidBigDecimals = false // Use BigDecimals for precision
)
// Write to bytes
val (bytes, info) = EBAWriter.writeBytes(
eba.index,
SICKWriterParameters(TableWriteStrategy.SinglePassInMemory)
)
// Save to file
val bytesArray = bytes.toArrayUnsafe()
Files.write(Paths.get("user.sick"), bytesArray)
// Read back from bytes (eager loading)
val structure = EagerEBAReader.readEBABytes(bytesArray)
// Find and reconstruct the root
val rootEntry = structure.findRoot("user.json").get
val reconstructed = structure.reconstruct(rootEntry.ref)
println(reconstructed) // Back to original JSON
// Or use incremental reader for effic
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
