Design Decisions

This page documents key design decisions made in the Kafka-to-Redis mapping.

Offset Mapping Strategy

Challenge: Kafka uses sequential integer offsets (0, 1, 2…​), while Redis Streams use timestamp-based entry IDs (e.g., 1234567890123-0).

Selected Approach: Stateless Encoding

Kafka offsets are computed directly from Redis Stream entry IDs using bit-packing:

Entry ID format:  {timestamp}-{sequence}
Kafka offset:     (timestamp << sequenceBits) | sequence

Configuration:

  • sequenceBits: Number of bits allocated for sequence number (default: 10)

  • Maximum sequence per millisecond: 2^sequenceBits - 1 (1023 for 10 bits)

  • If sequence exceeds maximum, Redis automatically increments timestamp

Advantages:

  • No extra storage: No hash mappings needed

  • Stateless: Offset ↔ Entry ID conversion is pure computation

  • Monotonic: Offsets increase monotonically like Kafka

  • Efficient: O(1) conversion in both directions

Trade-offs:

  • Offsets are large numbers (not 0, 1, 2…​)

  • Requires configuring sequenceBits based on throughput

  • High-throughput partitions (>1000 msg/ms) need more sequence bits

Consumer Group Coordination

Challenge: Kafka has a complex group coordination protocol (join, sync, heartbeat, rebalance).

Approach:

  • Use Redis Streams native consumer groups for message delivery and offset tracking

  • Implement Kafka group coordinator protocol in broker (in-memory state)

  • Use XGROUP CREATE to create consumer groups

  • Use XREADGROUP for group-based message consumption

  • Use XACK for offset commits (acknowledging messages)

  • Group membership, assignments, and rebalancing managed by broker (not stored in Redis)

Retention and Cleanup

Challenge: Kafka supports time-based and size-based retention policies.

Approach:

  • Retention enforced at write time using XADD arguments (not background job)

  • Size-based retention: XADD …​ MAXLEN {count} (exact trimming by default for predictable retention)

  • Time-based retention: XADD …​ MINID {timestamp}-0 (trim messages older than timestamp)

  • Approximate trimming available per-topic via approximate.trimming=true config for better performance

  • Retention configuration stored in topic metadata (Redis Hash)

  • Message count calculated from retention.bytes and average message size

Implementation: See ProduceHandler.xAddArgs() which applies retention policies to every XADD command.

Transactional Writes

Challenge: Kafka supports atomic writes across partitions.

Approach:

  • Use Redis transactions (MULTI/EXEC) for single-partition atomicity

  • Use Lua scripts for multi-partition atomic writes

  • Store transaction markers in stream entries

  • Implement idempotent producer semantics

Performance Optimization

Strategies:

  1. Pipelining: Batch multiple XADD commands in a single network round-trip

  2. Connection Pooling: Reuse Redis connections across requests

  3. Batch Reads: Use XREAD with COUNT to fetch multiple messages efficiently

  4. Group Reads: Use XREADGROUP to read from multiple streams in single call

Open Questions

Multi-Tenancy

Question: How to isolate topics across tenants?

Options:

  • Prefix all keys with tenant ID: {tenant}:{topic}:{partition}

  • Use separate Redis databases per tenant

  • Use Redis ACLs for access control

Replication

Question: How to handle Kafka’s replication factor > 1?

Options:

  • Use Redis Cluster or Redis Enterprise for replication

  • Ignore replication factor, rely on Redis persistence (AOF/RDB)

  • Implement application-level replication across Redis instances

Log Compaction

Question: How to support Kafka’s log compaction feature?

Options:

  • Background process that reads stream, deduplicates by key, writes to new stream

  • Use Redis Streams with custom compaction logic

  • Store compacted view in separate structure

Large Messages

Question: How to handle messages larger than Redis limits?

Options:

  • Store large payloads in separate keys, reference in stream entry

  • Reject messages above threshold with error response

  • Implement chunking for large messages