Benchmarks

This page documents performance benchmarks for Korvet with Redis Enterprise as the storage backend.

Optimal Configuration Benchmark

This benchmark demonstrates the best throughput configuration for Korvet with Redis Enterprise.

Test Environment

Redis Enterprise: 16 shards, running locally
Korvet: Single instance (macOS, Apple Silicon)
Kafka Tools: kafka-producer-perf-test from Apache Kafka
Topic Configuration: 16 partitions (1× the number of shards)
Record Size: 1 KB (1024 bytes)
Total Messages: 8,000,000 (1,000,000 per producer)

Configuration

Parameter	Value
Producers	8
Batch Size	1000 messages (1.07 MB)
Redis Connection Pool Size	8
Acks	1
Compression	none
Linger	0ms

Parameter

Value

Producers

Batch Size

1000 messages (1.07 MB)

Redis Connection Pool Size

Acks

Compression

none

Linger

0ms

Performance Results

Metric	Value
Aggregate Throughput	380,952 records/sec
Throughput (MB/sec)	372.02 MB/sec
Total Messages	8,000,000
Duration	21 seconds
Average Latency (range)	249-411 ms
95th Percentile Latency (range)	657-2136 ms

Metric

Value

Aggregate Throughput

380,952 records/sec

Throughput (MB/sec)

372.02 MB/sec

Total Messages

8,000,000

Duration

21 seconds

Average Latency (range)

249-411 ms

95th Percentile Latency (range)

657-2136 ms

Korvet Resource Usage

Metric	Value
Process CPU	2.35%
System CPU	20.51%
JVM Memory Used	244.13 MB

Metric

Value

Process CPU

2.35%

System CPU

20.51%

JVM Memory Used

244.13 MB

Redis Enterprise Metrics

Metric	Value
Total CPU (all 16 shards)	79%
Per-Shard CPU	2-8%
Data per Shard	112-172 MB
Data Distribution	Even across all shards

Metric

Value

Total CPU (all 16 shards)

79%

Per-Shard CPU

2-8%

Data per Shard

112-172 MB

Data Distribution

Even across all shards

Key Findings

High throughput with low CPU usage: Achieved 372 MB/sec with only 2.35% Korvet CPU usage
Excellent scalability headroom: Both Korvet and Redis Enterprise operating well below capacity
Even load distribution: Data and CPU load distributed evenly across all 16 Redis shards
Optimal batch size: 1000 messages per batch provided the best balance of throughput and latency

Running This Benchmark

To reproduce this benchmark, use the provided benchmark script from the korvet-dist repository:

git clone https://github.com/redis-field-engineering/korvet-dist.git
cd korvet-dist/samples/benchmark/scripts
./run-comprehensive-benchmark.sh

The script will:

Start Korvet with the specified Redis pool size
Create a topic with 16 partitions
Run 8 concurrent producers, each sending 1,000,000 messages
Collect metrics from Korvet (via actuator) and Redis Enterprise (via API)
Generate a detailed report with throughput, latency, and resource usage

Results are saved to /tmp/korvet-benchmark-<timestamp>/.

Single Shard Benchmark

This benchmark demonstrates Korvet performance with a single Redis shard, providing a baseline for comparison with multi-shard configurations.

Test Environment

Redis Enterprise: 1 shard (~1 GB maxmemory), running locally
Korvet: Single instance (macOS, Apple Silicon)
Kafka Tools: kafka-producer-perf-test from Apache Kafka
Topic Configuration: 1 partition (matching the single shard)
Record Size: 1 KB (1024 bytes)

Configuration

Parameter	Value
Producers	1 (baseline) / 8 (concurrent)
Batch Size	1000 messages (1.07 MB)
Redis Connection Pool Size	16
Acks	1
Compression	none
Linger	0ms

Parameter

Value

Producers

1 (baseline) / 8 (concurrent)

Batch Size

1000 messages (1.07 MB)

Redis Connection Pool Size

Acks

Compression

none

Linger

0ms

Performance Results

Single Producer (Baseline)

Metric	Value
Throughput	151,860 records/sec
Throughput (MB/sec)	148.30 MB/sec
Total Messages	200,000
Average Latency	160 ms
P99 Latency	329 ms

Metric

Value

Throughput

151,860 records/sec

Throughput (MB/sec)

148.30 MB/sec

Total Messages

200,000

Average Latency

160 ms

P99 Latency

329 ms

8 Concurrent Producers

Metric	Value
Aggregate Throughput	168,641 records/sec
Throughput (MB/sec)	164.68 MB/sec
Total Messages	674,564
Duration	4.58 seconds
Average Latency (range)	495-724 ms
Memory Used	776.47 MB

Metric

Value

Aggregate Throughput

168,641 records/sec

Throughput (MB/sec)

164.68 MB/sec

Total Messages

674,564

Duration

4.58 seconds

Average Latency (range)

495-724 ms

Memory Used

776.47 MB

Comparison: 16 Shards vs 1 Shard

Metric	16 Shards	1 Shard	Ratio
Database Memory	~16 GB	~1 GB	16×
Topic Partitions	16	1	16×
Throughput (rec/s)	380,952	168,641	2.26×
Throughput (MB/s)	372.02	164.68	2.26×
Per-shard throughput	23,809	168,641	0.14×

Key Findings

Single shard achieves ~44% of 16-shard aggregate throughput: 168,641 vs 380,952 records/sec
Higher per-shard efficiency with fewer shards: A single shard processes 168,641 rec/s vs 23,809 rec/s per shard in the 16-shard setup
Memory efficiency: ~1.15 KB per message in Redis Streams (776 MB for 674,564 messages)
Single producer baseline: 151,860 rec/s provides a clean baseline without concurrency overhead

Cold Storage Archival Benchmark

This benchmark measures the throughput of archiving messages from Redis Streams to Delta Lake on S3.

Test Environment

EC2 Instance: c5.2xlarge (8 vCPU, 16GB RAM) in us-west-1
S3 Bucket: Same region (us-west-1) for optimal network performance
Redis: Docker container on same instance
Message Size: ~100 bytes (binary payload)
Compression: ZSTD (Parquet default)

Single Stream Results

Archiving from a single Redis Stream to S3:

Messages	Archive Time	Throughput	Parquet Files	Delta Commits
1,000,000	31.3s	31,970 msg/s	100 @ 186ms avg	11 @ 509ms avg

Messages

Archive Time

Throughput

Parquet Files

Delta Commits

1,000,000

31.3s

31,970 msg/s

100 @ 186ms avg

11 @ 509ms avg

Multi-Stream Results (4 Partitions)

Archiving from 4 Redis Streams in parallel to S3:

Messages	Archive Time	Throughput	Parquet Files	Delta Commits
1,000,000	12.5s	80,239 msg/s	100 @ 212ms avg	16 @ 385ms avg
4,000,000	34.7s	115,347 msg/s	400 @ 192ms avg	44 @ 486ms avg

Messages

Archive Time

Throughput

Parquet Files

Delta Commits

1,000,000

12.5s

80,239 msg/s

100 @ 212ms avg

16 @ 385ms avg

4,000,000

34.7s

115,347 msg/s

400 @ 192ms avg

44 @ 486ms avg

Scaling Summary

Configuration	Throughput	vs Single Stream
1 stream	32k msg/s	baseline
4 streams (1M messages)	80k msg/s	2.5×
4 streams (4M messages)	115k msg/s	3.6×

Configuration

Throughput

vs Single Stream

1 stream

32k msg/s

baseline

4 streams (1M messages)

80k msg/s

2.5×

4 streams (4M messages)

115k msg/s

3.6×

Key Findings

Single stream peaks at ~32k msg/s: Bottleneck is S3 PUT latency for Parquet files
Near-linear scaling with streams: 4 streams achieves 115k msg/s (3.6× single stream)
Parquet writes average ~190ms: Same-region S3 provides consistent low latency
Delta commits average ~480ms: Includes S3 metadata operations for transaction log
Excellent compression: ZSTD achieves ~50:1 compression ratio (~2 bytes/message stored)
Same-region S3 is critical: Cross-region throughput drops ~50%

Storage Efficiency

Metric	Value
Messages archived	4,000,000
S3 objects created	444 (400 Parquet + 44 Delta logs)
Total S3 storage	~8 MB
Bytes per message	~2 bytes (after ZSTD compression)
Compression ratio	~50:1

Metric

Value

Messages archived

4,000,000

S3 objects created

444 (400 Parquet + 44 Delta logs)

Total S3 storage

~8 MB

Bytes per message

~2 bytes (after ZSTD compression)

Compression ratio

~50:1

Archival Configuration

The archival service was configured with:

storage:
  enabled: true
  path: s3a://your-bucket/korvet
  s3:
    region: us-west-1

# Tuning parameters (per stream)
read-worker-count: 4      # One per stream
commit-worker-count: 4    # One per stream
redis-batch-size: 10000   # Messages per Redis XREADGROUP
max-batches-per-commit: 10
files-per-delta-commit: 10

Redis Flex (Auto-Tiering) Benchmark

This benchmark evaluates Korvet performance with Redis Flex (Auto-Tiering), which uses NVMe flash storage to extend Redis capacity beyond RAM.

Test Environment

Redis Enterprise: 1× i4i.xlarge (4 vCPU, 32GB RAM, 937GB NVMe)
Database Config: 100GB capacity, 10GB RAM (10% ratio), 8 shards
Korvet Client: c7i.4xlarge (16 vCPU, 32GB RAM)
Kafka Tools: kafka-producer-perf-test from Apache Kafka
Record Size: 1 KB (1024 bytes)
Region: us-west-2 (all instances in same VPC)

Test Configuration

Parameter Value

Parameter	Value
Instance Type (Redis)	i4i.xlarge (NVMe-backed)
Instance Type (Client)	c7i.4xlarge
Shards	8 (1.25GB RAM per shard)
Redis Pool Size	256
Producer Batch Size	128KB (`batch.size=131072`)
Linger	5ms (`linger.ms=5`)
Acks	1

Instance Type (Redis)

i4i.xlarge (NVMe-backed)

Instance Type (Client)

c7i.4xlarge

Shards

8 (1.25GB RAM per shard)

Redis Pool Size

256

Producer Batch Size

128KB (batch.size=131072)

Linger

5ms (linger.ms=5)

Acks

Performance Results

Metric	Korvet → Redis Flex	Direct Redis (XADD)
Peak Throughput	150,784 rec/s (147 MB/s)	130,690 rec/s (128 MB/s)
Sustained Throughput	110,000 rec/s (107 MB/s)	103,000 rec/s (100 MB/s)
Average Latency	201 ms	< 1 ms
P99 Latency	510 ms	28 ms

Metric

Korvet → Redis Flex

Direct Redis (XADD)

Peak Throughput

150,784 rec/s (147 MB/s)

130,690 rec/s (128 MB/s)

Sustained Throughput

110,000 rec/s (107 MB/s)

103,000 rec/s (100 MB/s)

Average Latency

201 ms

< 1 ms

P99 Latency

510 ms

28 ms

Data Structure Comparison

We compared Redis Streams (XADD) vs simple key-value (SET) operations on Redis Flex:

Operation	Throughput	Notes
SET (1KB values)	153,000 ops/sec	Simple key-value, flash-friendly
XADD (Streams, 1KB payload)	103,000 ops/sec	Stream data structure overhead
Korvet → XADD	110-150k rec/sec	Near-native XADD performance

Operation

Throughput

Notes

SET (1KB values)

153,000 ops/sec

Simple key-value, flash-friendly

XADD (Streams, 1KB payload)

103,000 ops/sec

Stream data structure overhead

Korvet → XADD

110-150k rec/sec

Near-native XADD performance

Key Findings

Korvet matches native Redis Streams performance: Korvet achieved 110-150k rec/sec, matching or exceeding direct XADD benchmarks
Flash eviction is the bottleneck for sustained writes: RAM fills faster than NVMe can drain at very high throughput
Larger RAM buffers help: 8 shards (1.25GB RAM/shard) outperformed 48 shards (208MB RAM/shard) by avoiding OOM errors
Client instance sizing matters: Upgraded from t3.medium (2 vCPU) to c7i.4xlarge (16 vCPU) to eliminate client-side bottleneck

OOM Behavior

At sustained throughput above ~150k rec/sec with 1KB payloads, Redis Flex may return OOM errors when the RAM buffer fills faster than flash eviction can drain. This is inherent to Redis Streams on flash storage, not specific to Korvet.

Scenario	Throughput	Result
Burst (1M records)	150k rec/s	✅ Success
Sustained (2M+ records)	150k rec/s	⚠️ OOM after ~1.3M records
Sustained (unlimited)	110k rec/s	✅ Success

Scenario

Throughput

Result

Burst (1M records)

150k rec/s

✅ Success

Sustained (2M+ records)

150k rec/s

⚠️ OOM after ~1.3M records

Sustained (unlimited)

110k rec/s

✅ Success

Mitigation: For sustained high-throughput workloads on Redis Flex:

Use fewer shards with larger RAM buffers (e.g., 8 shards vs 48)
Increase RAM-to-disk ratio (e.g., 15-20% instead of 10%)
Throttle producer throughput to ~100k rec/sec per instance
Use multiple Redis Flex clusters for horizontal scaling

Sizing Recommendations for Redis Flex

Workload	Shards	RAM per Shard
Light (< 50k rec/s)	4	2.5GB
Medium (50-100k rec/s)	8	1.25GB+
Heavy (100k+ rec/s)	8-16	1GB+ (with throttling)

Workload

Shards

RAM per Shard

Light (< 50k rec/s)

2.5GB

Medium (50-100k rec/s)

1.25GB+

Heavy (100k+ rec/s)

8-16

1GB+ (with throttling)

Configuration Recommendations

For optimal throughput:

Batch size: Use 1000 messages per batch for best balance of throughput and latency
Producers: 8 concurrent producers provides excellent throughput with manageable latency
Redis pool size: Match pool size to number of producers (8) for optimal connection utilization
Partitions: Use 1-2× the number of Redis shards (16 partitions for 16 shards)
Redis shards: Match the number of shards to available CPU cores
Rebalance delay: Configure korvet.server.rebalance-delay appropriately (default 10s) to allow all consumers to join before rebalancing
Replication: Disable replication for write-heavy workloads (if durability requirements allow)

Running Your Own Benchmarks

Using the Benchmark Script

The korvet-dist repository contains a script to run benchmarks with various configurations.

git clone https://github.com/redis-field-engineering/korvet-dist.git
cd korvet-dist/samples/benchmark/scripts
./run-comprehensive-benchmark.sh

Configuration Options

Edit the script to customize benchmark parameters:

# Test parameters
TOPIC="benchmark-test"
PARTITIONS=16
RECORD_SIZE=1024
NUM_RECORDS=1000000

# Parameter arrays
PRODUCERS=(8)           # Number of concurrent producers
BATCH_SIZES=(1000)      # Messages per batch
POOL_SIZES=(8)          # Redis connection pool size

What the Script Does

Starts Korvet with the specified Redis pool size
Flushes Redis to ensure clean state
Creates topic with specified number of partitions
Runs producers using kafka-producer-perf-test
Collects metrics:
- Korvet CPU and memory (via Spring Boot Actuator at port 8080)
- Redis Enterprise CPU and memory (via REST API at port 9443)
- Producer throughput and latency
Generates report with detailed results

Output

Results are saved to /tmp/korvet-benchmark-<timestamp>/:

SUMMARY.txt: Summary table of all test results
producers-<N>_batch-<B>msg_pool-<P>.txt: Detailed results for each test

Example summary output:

Producers  Batch(msg)   Pool       Total Msgs      Duration(s)  Throughput(rec/s)  Throughput(MB/s)
8          1000         8          8000000         21           380952             372.02

Running Cold Storage Benchmarks

To run cold storage benchmarks against S3:

# Set S3 bucket and region
export S3_BUCKET=your-bucket-name
export AWS_REGION=us-west-1

# Run single-stream benchmark (100k messages default)
./gradlew :korvet-storage:test \
  --tests "StreamArchivalServiceS3Benchmark.benchmarkS3"

# Run with more messages
./gradlew :korvet-storage:test \
  --tests "StreamArchivalServiceS3Benchmark.benchmarkS3" \
  -Dtest.message.count=1000000

# Run multi-stream benchmark (4 streams)
./gradlew :korvet-storage:test \
  --tests "StreamArchivalServiceS3Benchmark.benchmarkS3MultiStream" \
  -Dtest.message.count=1000000

For best results, run on an EC2 instance in the same region as your S3 bucket.

Cold Storage Read Benchmarks (End-to-End)

The ColdStorageReadBenchmark JUnit test measures end-to-end Kafka consumer read performance when data is in cold storage (Delta Lake). This benchmark uses real Kafka clients and the local filesystem as a storage backend, which exercises the same Delta Lake code paths as S3 without requiring cloud infrastructure.

Test Scenarios

The benchmark tests four scenarios:

Standalone Consumer (assign/seek) - Direct partition assignment without group coordination
Consumer Group (subscribe) - Full consumer group protocol with coordination overhead
ListOffsets API - Earliest/latest offset lookups from cold storage
OffsetsForTimes API - Timestamp-based offset lookups from cold storage

Environment Variables

Variable Default Description

Variable	Default	Description
`BENCHMARK_MESSAGE_COUNT`	`10000`	Number of messages to produce and archive
`BENCHMARK_RECORD_SIZE`	`1024`	Record size in bytes
`BENCHMARK_ITERATIONS`	`3`	Number of iterations per scenario
`KORVET_VERSION`	`unknown`	Version string included in results (for tracking across releases)

BENCHMARK_MESSAGE_COUNT

10000

Number of messages to produce and archive

BENCHMARK_RECORD_SIZE

1024

Record size in bytes

BENCHMARK_ITERATIONS

3

Number of iterations per scenario

KORVET_VERSION

unknown

Version string included in results (for tracking across releases)

Running the Benchmark

# Basic usage
./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"

# With custom configuration
BENCHMARK_MESSAGE_COUNT=50000 BENCHMARK_ITERATIONS=5 KORVET_VERSION=0.5.0-ea1 \
  ./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"

CI/CD Integration (GitHub Actions)

jobs:
  cold-storage-benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          java-version: '25'
          distribution: 'temurin'

      - name: Run Cold Storage Benchmark
        env:
          KORVET_VERSION: ${{ github.ref_name }}
        run: ./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"

      - name: Upload Benchmark Results
        uses: actions/upload-artifact@v4
        with:
          name: benchmark-results
          path: korvet-app/build/benchmark-results/*.json

Output Format

Results are written to korvet-app/build/benchmark-results/cold-storage-read-<timestamp>.json:

{
  "korvetVersion": "0.5.0-ea1",
  "timestamp": "2026-03-30T17:30:00Z",
  "gitCommit": "abc1234",
  "environment": {
    "javaVersion": "25",
    "osName": "Linux",
    "availableProcessors": 4,
    "maxMemoryMb": 4096
  },
  "config": {
    "messageCount": 10000,
    "recordSizeBytes": 1024,
    "partitions": 1,
    "iterations": 3,
    "batchSizes": [100, 500, 1000]
  },
  "scenarios": [
    {
      "type": "STANDALONE_CONSUMER",
      "batchSize": 100,
      "avgLatencyMs": 250,
      "p95LatencyMs": 320,
      "throughputMsgPerSec": 400.0
    }
  ]
}

Comparing Across Releases

The JSON output includes version and environment information, making it easy to compare results:

# Download artifacts from different releases
gh run download <run-id-1> -n benchmark-results -D results/v0.5.0-ea1
gh run download <run-id-2> -n benchmark-results -D results/v0.5.0-ea2

# Compare using jq
jq -s '.[0].scenarios[0] as $old | .[1].scenarios[0] as $new |
  {scenario: $new.type, improvement: (($old.avgLatencyMs - $new.avgLatencyMs) / $old.avgLatencyMs * 100)}' \
  results/v0.5.0-ea1/*.json results/v0.5.0-ea2/*.json

Interpreting Results

Standalone vs Consumer Group overhead: Consumer group reads include additional coordination (JoinGroup, SyncGroup, OffsetFetch) which adds latency
ListOffsets Earliest latency: High values indicate slow cold tier metadata lookups (Delta Lake checkpoint/commit scanning)
Throughput scaling: If throughput doesn’t scale linearly with batch size, there may be per-request overhead dominating
P95 vs Avg: Large gaps indicate Delta Lake compaction or GC pauses

This benchmark uses local filesystem storage, so absolute latency numbers will be lower than production S3 deployments. Focus on relative changes between releases rather than absolute values.

Timeout Recommendations

Based on production experience with S3-backed cold storage, configure consumer timeouts appropriately:

// For cold storage reads, increase timeouts
props.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, "60000");  // 60s
props.put(ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, "120000");  // 2min
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "30000");  // 30s