Rafka
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
Install / Use
/learn @Mahir101/RafkaREADME
Rafka
A High-Performance Distributed Message Broker Built in Rust
Rafka is a blazing-fast, experimental distributed asynchronous message broker inspired by Apache Kafka. Built with Rust and leveraging Tokio's async runtime, it delivers exceptional performance through its peer-to-peer mesh architecture and custom in-memory database for unparalleled scalability and low-latency message processing.
🚀 Key Features
- High-Performance Async Architecture: Built on Tokio for maximum concurrency and throughput
- gRPC Communication: Modern protocol buffers for efficient inter-service communication
- Partitioned Message Processing: Hash-based partitioning for horizontal scalability
- Disk-based Persistence: Write-Ahead Log (WAL) for message durability
- Consumer Groups: Load-balanced message consumption with partition assignment
- Replication: Multi-replica partitions with ISR tracking for high availability
- Log Compaction: Multiple strategies (KeepLatest, TimeWindow, Hybrid) for storage optimization
- Transactions: Two-Phase Commit (2PC) with idempotent producer support
- Comprehensive Monitoring: Health checks, heartbeat tracking, and circuit breakers
- Real-time Metrics: Prometheus-compatible metrics export with latency histograms
- Stream Processing: Kafka Streams-like API for message transformation and aggregation
- Offset Tracking: Consumer offset management for reliable message delivery
- Retention Policies: Configurable message retention based on age and size
- Modular Design: Clean separation of concerns across multiple crates
🆚 Rafka vs Apache Kafka Feature Comparison
| Feature | Apache Kafka | Rafka (Current) | Status | |---------|--------------|-----------------|--------| | Storage | Disk-based (Persistent) | Disk-based WAL (Persistent) | ✅ Implemented | | Architecture | Leader/Follower (Zookeeper/KRaft) | P2P Mesh / Distributed | 🔄 Different Approach | | Consumption Model | Consumer Groups (Load Balancing) | Consumer Groups + Pub/Sub | ✅ Implemented | | Replication | Multi-replica with ISR | Multi-replica with ISR | ✅ Implemented | | Message Safety | WAL (Write Ahead Log) | WAL (Write Ahead Log) | ✅ Implemented | | Transactions | Exactly-once semantics | 2PC with Idempotent Producers | ✅ Implemented | | Compaction | Log Compaction | Log Compaction (Multiple Strategies) | ✅ Implemented | | Ecosystem | Connect, Streams, Schema Registry | Core Broker only | ❌ Missing |
🔍 Feature Implementation Status
✅ Implemented Features
- Disk-based Persistence (WAL): Rafka now implements a Write-Ahead Log (WAL) for message durability. Messages are persisted to disk and survive broker restarts.
- Consumer Groups: Rafka supports consumer groups with load balancing. Multiple consumers can share the load of a topic, with each partition being consumed by only one member of the group. Both Range and RoundRobin partition assignment strategies are supported.
- Replication & High Availability: Rafka implements multi-replica partitions with In-Sync Replica (ISR) tracking and leader election for high availability.
- Log Compaction: Rafka supports log compaction with multiple strategies (KeepLatest, TimeWindow, Hybrid) to optimize storage by keeping only the latest value for a key.
- Transactions: Rafka implements atomic writes across multiple partitions/topics using Two-Phase Commit (2PC) protocol with idempotent producer support.
❌ Missing Features
- Ecosystem Tools: Unlike Apache Kafka, Rafka currently lacks ecosystem tools like Kafka Connect (for data integration), Kafka Streams (for stream processing), and Schema Registry (for schema management). These would need to be developed separately to provide a complete data streaming platform.
🏗️ Architecture Overview
System Architecture Diagram
graph TB
subgraph "Client Layer"
P[Producer]
C[Consumer]
end
subgraph "Broker Cluster"
B1[Broker 1<br/>Partition 0]
B2[Broker 2<br/>Partition 1]
B3[Broker 3<br/>Partition 2]
end
subgraph "Storage Layer"
S1[In-Memory DB<br/>Partition 0]
S2[In-Memory DB<br/>Partition 1]
S3[In-Memory DB<br/>Partition 2]
end
P -->|gRPC Publish| B1
P -->|gRPC Publish| B2
P -->|gRPC Publish| B3
B1 -->|Store Messages| S1
B2 -->|Store Messages| S2
B3 -->|Store Messages| S3
C -->|gRPC Consume| B1
C -->|gRPC Consume| B2
C -->|gRPC Consume| B3
B1 -->|Broadcast Stream| C
B2 -->|Broadcast Stream| C
B3 -->|Broadcast Stream| C
Message Flow Sequence Diagram
sequenceDiagram
participant P as Producer
participant B as Broker
participant S as Storage
participant C as Consumer
P->>B: PublishRequest(topic, key, payload)
B->>B: Hash key for partition
B->>B: Check partition ownership
B->>S: Store message with offset
S-->>B: Return offset
B->>B: Broadcast to subscribers
B-->>P: PublishResponse(message_id, offset)
C->>B: ConsumeRequest(topic)
B->>B: Create broadcast stream
B-->>C: ConsumeResponse stream
loop Message Processing
B->>C: ConsumeResponse(message)
C->>B: AcknowledgeRequest(message_id)
C->>B: UpdateOffsetRequest(offset)
end
📁 Project Structure
rafka/
├── Cargo.toml # Workspace manifest
├── config/
│ └── config.yml # Configuration file
├── scripts/ # Demo and utility scripts
│ ├── helloworld.sh # Basic producer-consumer demo
│ ├── partitioned_demo.sh # Multi-broker partitioning demo
│ ├── retention_demo.sh # Message retention demo
│ ├── offset_tracking_demo.sh # Consumer offset tracking demo
│ └── kill.sh # Process cleanup script
├── src/
│ └── bin/ # Executable binaries
│ ├── start_broker.rs # Broker server
│ ├── start_producer.rs # Producer client
│ ├── start_consumer.rs # Consumer client
│ └── check_metrics.rs # Metrics monitoring
├── crates/ # Core library crates
│ ├── core/ # Core types and gRPC definitions
│ │ ├── src/
│ │ │ ├── lib.rs
│ │ │ ├── message.rs # Message structures
│ │ │ └── proto/
│ │ │ └── rafka.proto # gRPC service definitions
│ │ └── build.rs # Protocol buffer compilation
│ ├── broker/ # Broker implementation
│ │ └── src/
│ │ ├── lib.rs
│ │ └── broker.rs # Core broker logic
│ ├── producer/ # Producer implementation
│ │ └── src/
│ │ ├── lib.rs
│ │ └── producer.rs # Producer client
│ ├── consumer/ # Consumer implementation
│ │ └── src/
│ │ ├── lib.rs
│ │ └── consumer.rs # Consumer client
│ └── storage/ # Storage engine
│ └── src/
│ ├── lib.rs
│ └── db.rs # In-memory database
├── docs/
│ └── getting_started.md # Getting started guide
├── tasks/
│ └── Roadmap.md # Development roadmap
├── Dockerfile # Container configuration
└── LICENSE # MIT License
🚀 Quick Start
Prerequisites
- Rust: Latest stable version (1.70+)
- Cargo: Comes with Rust installation
- Protocol Buffers: For gRPC compilation
Installation
- Clone the repository:
git clone https://github.com/yourusername/rafka.git
cd rafka
- Build the project:
cargo build --release
- Run the basic demo:
./scripts/helloworld.sh
Manual Setup
- Start a broker:
cargo run --bin start_broker -- --port 50051 --partition 0 --total-partitions 3
- Start a consumer:
cargo run --bin start_consumer -- --port 50051
- Send messages:
cargo run --bin start_producer -- --message "Hello, Rafka!" --key "test-key"
🔧 Configuration
Broker Configuration
The broker can be configured via command-line arguments:
cargo run --bin start_broker -- \
--port 50051 \
--partition 0 \
--total-partitions 3 \
--retention-seconds 604800
Available Options:
--port: Broker listening port (default: 50051)--partition: Partition ID for this broker (default: 0)--total-partitions: Total number of partitions (default: 1)--retention-seconds: Message retention time in seconds (default: 7 days)
Configuration File
Edit config/config.yml for persistent settings:
server:
host: "127.0.0.1"
port: 9092
log:
level: "info" # debug, info, warn, error
broker:
replication_factor: 3
default_topic_partitions: 1
storage:
type: "in_memory"
🏛️ Core Components
1. Core (rafka-core)
Purpose: Defines fundamental types and gRPC service contracts.
Key Components:
- Message Structures:
Message,MessageAck,BenchmarkMetrics - gRPC Definitions: Protocol buffer definitions for all services
- Serialization: Serde-based serialization for message handling
Key Files:
message.rs: Core message types and acknowledgment structuresproto/rafka.proto: gRPC service definitions
2. Broker (rafka-broker)
Purpose: Central message routing and coordination service.
Key Features:
- Partition Management: Hash-based message partitioning
- Topic Management: Dynamic topic creation and subscription
- Broadcast Channels: Efficient message distribution to consumers
- Offset Tracking: Consumer offset management
- Retention Policies: Configurable message retention
- Metrics Collection: Real-time performance metrics
Key Operations:
publish(): Accept messages from producersconsume(): Stream messages to consumerssubscribe(): Register consumer subscriptionsacknowledge(): Pr
