For the latest stable version, please use Korvet 0.12.5!

Storage

Korvet uses Redis Streams as its primary storage layer, with optional tiered storage to Delta Lake for long-term archival.

Storage Architecture

Korvet supports two storage configurations:

Redis-Only Storage (Default)

The default configuration uses Redis Streams exclusively:

  • Primary storage: All messages stored in Redis Streams

  • Persistence: Redis AOF and RDB for durability

  • Consumer groups: Built-in support for coordinated consumption

  • Performance: Sub-millisecond read/write latency

  • Retention: Configurable time and size-based retention (applied at write time)

This is the recommended configuration for most use cases.

Tiered Storage (Experimental)

For long-term data retention and cost optimization, Korvet can be configured with tiered storage:

  • Hot tier: Recent messages in Redis Streams for low-latency access

  • Cold tier: Archived messages in Delta Lake (Parquet format) on S3 or local filesystem

  • Automatic archival: Separate Flink job archives old messages from Redis to Delta Lake

  • Transparent reads: Standalone consumers automatically read from both tiers

  • Consumer groups: Always read from hot tier only (Redis Streams)

Tiered storage is not production-ready and is currently experimental.

It requires:

  • Separate Flink job for archival (see korvet-storage-writer module)

  • Delta Lake storage configuration (S3 or local filesystem)

  • Additional operational complexity

Most users should use Redis-only storage.

How It Works

Redis-Only Storage

  1. Produce: Messages are written to Redis Streams using XADD

  2. Retention: Retention policies are applied at write time using MAXLEN and MINID arguments

  3. Consume: Consumers read messages using XREAD (standalone) or XREADGROUP (consumer groups)

  4. Persistence: Redis handles durability through AOF/RDB snapshots

Tiered Storage

  1. Produce: Messages are written to Redis Streams (hot tier)

  2. Archive: Flink job periodically reads old messages from Redis and writes to Delta Lake (cold tier)

  3. Consume:

    • Standalone consumers read from both hot and cold tiers automatically

    • Consumer groups read from hot tier only

  4. Cleanup: Archived messages can be trimmed from Redis to save memory

Stream Key Format

Each Kafka topic partition maps to a single Redis Stream:

{keyspace}:stream:{topic}:{partition}

Examples (using default keyspace korvet):

korvet:stream:orders:0        # Topic "orders", partition 0
korvet:stream:orders:1        # Topic "orders", partition 1
korvet:stream:payments:0      # Topic "payments", partition 0

Message Encoding

Kafka records are decomposed into Redis Stream fields:

  • Key: Stored in __key field (if present)

  • Headers: Stored as __header.{name} fields

  • Value: Encoding depends on value type:

    • JSON: Top-level fields are flattened into separate stream fields

    • Raw bytes: Stored as single value field

See Message Format for details.

Benefits

  • Performance: Sub-millisecond latency for hot tier operations

  • Simplicity: Redis-only mode requires no additional infrastructure

  • Reliability: Redis persistence ensures data durability

  • Scalability: Handle millions of messages per second

  • Cost optimization: Optional cold tier reduces storage costs for long-term retention

  • Flexibility: Choose between simplicity (Redis-only) and cost optimization (tiered)