This version is still in development and is not considered stable yet. For the latest stable version, please use Korvet 0.12.5!

Storage

Korvet uses Redis Streams as its primary storage layer, with optional tiered storage to Delta Lake for long-term archival.

Storage Architecture

Korvet supports two storage configurations:

Redis-Only Storage (Default)

The default configuration uses Redis Streams exclusively:

  • Primary storage: All messages stored in Redis Streams

  • Persistence: Redis AOF and RDB for durability

  • Consumer groups: Built-in support for coordinated consumption

  • Performance: Sub-millisecond read/write latency

  • Retention: Configurable time and size-based retention (applied at write time)

This is the recommended configuration for most use cases.

Tiered Storage (Hot → Cold)

For long-term data retention and cost optimization, Korvet can be configured with 2 storage tiers:

  • Hot tier: Recent messages in Redis Streams for lowest latency

  • Cold tier: Archived messages in Delta Lake (Parquet format) on S3 or local filesystem

  • Automatic archival: Built-in archival service continuously moves messages from hot to cold tier

  • High throughput: Achieves 100k+ messages/second to S3 with parallel streams

How It Works

Redis-Only Storage

  1. Produce: Messages are written to Redis Streams using XADD

  2. Retention: Retention policies are applied at write time using MAXLEN and MINID arguments

  3. Consume: Consumers read messages using XREAD (standalone) or XREADGROUP (consumer groups)

  4. Persistence: Redis handles durability through AOF/RDB snapshots

Tiered Storage

  1. Produce: Messages are written to Redis Streams (hot tier)

  2. Archive: Built-in archival service continuously reads messages from Redis and writes to Delta Lake (cold tier)

  3. Query: Use Spark, Databricks, Presto, or Athena to query archived data

  4. Cleanup: Archived messages can be trimmed from Redis to save memory

Per-Topic Tiered Storage Configuration

Tiered storage is controlled at the topic level using Kafka-compatible configuration:

  • remote.storage.enable=true - Enable tiered storage for a topic (Kafka KIP-405)

  • local.retention.ms - Time to keep in hot tier before moving to cold

  • retention.ms - Total retention across all tiers

Example: 1 hour hot, 6 days cold (7 days total):

kafka-configs --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic --alter \
  --add-config remote.storage.enable=true,local.retention.ms=3600000,retention.ms=604800000

See Topic Configuration for full details.

Stream Key Format

Each Kafka topic partition maps to a single Redis Stream:

{keyspace}:stream:{topic}:{partition}

Examples (using default keyspace korvet):

korvet:stream:orders:0        # Topic "orders", partition 0
korvet:stream:orders:1        # Topic "orders", partition 1
korvet:stream:payments:0      # Topic "payments", partition 0

Message Encoding

Kafka records are decomposed into Redis Stream fields:

  • Key: Stored in __key field (if present)

  • Headers: Stored as __header.{name} fields

  • Value: Encoding depends on value type:

    • JSON: Top-level fields are flattened into separate stream fields

    • Raw bytes: Stored as single value field

See Message Format for details.

Benefits

  • Performance: Sub-millisecond latency for hot tier operations

  • Simplicity: Redis-only mode requires no additional infrastructure

  • Reliability: Redis persistence ensures data durability

  • Scalability: Handle millions of messages per second

  • Cost optimization: Optional cold tier reduces storage costs for long-term retention

  • Flexibility: Choose between simplicity (Redis-only) and cost optimization (tiered)