Storage

Korvet uses Redis Streams as its primary storage layer, with optional tiered storage to Delta Lake for long-term archival.

Storage Architecture

Korvet supports two storage configurations:

Redis-Only Storage (Default)

The default configuration uses Redis Streams exclusively:

  • Primary storage: All messages stored in Redis Streams

  • Persistence: Redis AOF and RDB for durability

  • Consumer groups: Built-in support for coordinated consumption

  • Performance: Sub-millisecond read/write latency

  • Retention: Configurable time and size-based retention (applied at write time)

This is the recommended configuration for most use cases.

Tiered Storage (Hot → Cold)

For long-term data retention and cost optimization, Korvet can be configured with 2 storage tiers:

  • Hot tier: Recent messages in Redis Streams for lowest latency

  • Cold tier: Archived messages in Delta Lake (Parquet format) on S3 or local filesystem

  • Automatic archival: Built-in archival service continuously moves messages from hot to cold tier

  • High throughput: Achieves 100k+ messages/second to S3 with parallel streams

How It Works

Redis-Only Storage

  1. Produce: Messages are written to Redis Streams using XADD

  2. Retention: Retention policies are applied at write time using MAXLEN and MINID arguments

  3. Consume: Consumers read messages using XREAD (standalone) or XREADGROUP (consumer groups)

  4. Persistence: Redis handles durability through AOF/RDB snapshots

Tiered Storage

  1. Produce: Messages are written to Redis Streams (hot tier)

  2. Archive: Built-in archival service continuously reads messages from Redis and writes to Delta Lake (cold tier)

  3. Query: Use Spark, Databricks, Presto, or Athena to query archived data

  4. Cleanup: Archived messages can be trimmed from Redis to save memory

Per-Topic Tiered Storage Configuration

Tiered storage is controlled at the topic level using Kafka-compatible configuration:

  • remote.storage.enable=true - Enable tiered storage for a topic (Kafka KIP-405)

  • local.retention.ms - Time to keep in hot tier before moving to cold

  • retention.ms - Total retention across all tiers

Example: 1 hour hot, 6 days cold (7 days total):

kafka-configs --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name my-topic --alter \
  --add-config remote.storage.enable=true,local.retention.ms=3600000,retention.ms=604800000

See Topic Configuration for full details.

Stream Key Format

Each Kafka topic partition maps to a single Redis Stream:

{keyspace}:stream:{topic}:{partition}

Examples (using default keyspace korvet):

korvet:stream:orders:0        # Topic "orders", partition 0
korvet:stream:orders:1        # Topic "orders", partition 1
korvet:stream:payments:0      # Topic "payments", partition 0

Message Encoding

Kafka records are decomposed into Redis Stream fields:

  • Key: Stored in __key field (if present)

  • Headers: Stored as __header.{name} fields

  • Value: Encoding depends on value type:

    • JSON: Top-level fields are flattened into separate stream fields

    • Raw bytes: Stored as single value field

See Message Format for details.

Benefits

  • Performance: Sub-millisecond latency for hot tier operations

  • Simplicity: Redis-only mode requires no additional infrastructure

  • Reliability: Redis persistence ensures data durability

  • Scalability: Handle millions of messages per second

  • Cost optimization: Optional cold tier reduces storage costs for long-term retention

  • Flexibility: Choose between simplicity (Redis-only) and cost optimization (tiered)