Storage
Korvet uses Redis Streams as its primary storage layer, with optional tiered storage to Delta Lake for long-term archival.
Storage Architecture
Korvet supports two storage configurations:
Redis-Only Storage (Default)
The default configuration uses Redis Streams exclusively:
-
Primary storage: All messages stored in Redis Streams
-
Persistence: Redis AOF and RDB for durability
-
Consumer groups: Built-in support for coordinated consumption
-
Performance: Sub-millisecond read/write latency
-
Retention: Configurable time and size-based retention (applied at write time)
This is the recommended configuration for most use cases.
Tiered Storage (Hot → Cold)
For long-term data retention and cost optimization, Korvet can be configured with 2 storage tiers:
-
Hot tier: Recent messages in Redis Streams for lowest latency
-
Cold tier: Archived messages in Delta Lake (Parquet format) on S3 or local filesystem
-
Automatic archival: Built-in archival service continuously moves messages from hot to cold tier
-
High throughput: Achieves 100k+ messages/second to S3 with parallel streams
How It Works
Redis-Only Storage
-
Produce: Messages are written to Redis Streams using
XADD -
Retention: Retention policies are applied at write time using
MAXLENandMINIDarguments -
Consume: Consumers read messages using
XREAD(standalone) orXREADGROUP(consumer groups) -
Persistence: Redis handles durability through AOF/RDB snapshots
Tiered Storage
-
Produce: Messages are written to Redis Streams (hot tier)
-
Archive: Built-in archival service continuously reads messages from Redis and writes to Delta Lake (cold tier)
-
Query: Use Spark, Databricks, Presto, or Athena to query archived data
-
Cleanup: Archived messages can be trimmed from Redis to save memory
Per-Topic Tiered Storage Configuration
Tiered storage is controlled at the topic level using Kafka-compatible configuration:
-
remote.storage.enable=true- Enable tiered storage for a topic (Kafka KIP-405) -
local.retention.ms- Time to keep in hot tier before moving to cold -
retention.ms- Total retention across all tiers
Example: 1 hour hot, 6 days cold (7 days total):
kafka-configs --bootstrap-server localhost:9092 \
--entity-type topics --entity-name my-topic --alter \
--add-config remote.storage.enable=true,local.retention.ms=3600000,retention.ms=604800000
See Topic Configuration for full details.
Stream Key Format
Each Kafka topic partition maps to a single Redis Stream:
{keyspace}:stream:{topic}:{partition}
Examples (using default keyspace korvet):
korvet:stream:orders:0 # Topic "orders", partition 0 korvet:stream:orders:1 # Topic "orders", partition 1 korvet:stream:payments:0 # Topic "payments", partition 0
Message Encoding
Kafka records are decomposed into Redis Stream fields:
-
Key: Stored in
__keyfield (if present) -
Headers: Stored as
__header.{name}fields -
Value: Encoding depends on value type:
-
JSON: Top-level fields are flattened into separate stream fields
-
Raw bytes: Stored as single
valuefield
-
See Message Format for details.
Benefits
-
Performance: Sub-millisecond latency for hot tier operations
-
Simplicity: Redis-only mode requires no additional infrastructure
-
Reliability: Redis persistence ensures data durability
-
Scalability: Handle millions of messages per second
-
Cost optimization: Optional cold tier reduces storage costs for long-term retention
-
Flexibility: Choose between simplicity (Redis-only) and cost optimization (tiered)