Benchmarks
This page documents performance benchmarks for Korvet with Redis Enterprise as the storage backend.
Optimal Configuration Benchmark
This benchmark demonstrates the best throughput configuration for Korvet with Redis Enterprise.
Test Environment
-
Redis Enterprise: 16 shards, running locally
-
Korvet: Single instance (macOS, Apple Silicon)
-
Kafka Tools:
kafka-producer-perf-testfrom Apache Kafka -
Topic Configuration: 16 partitions (1× the number of shards)
-
Record Size: 1 KB (1024 bytes)
-
Total Messages: 8,000,000 (1,000,000 per producer)
Configuration
| Parameter | Value |
|---|---|
Producers |
8 |
Batch Size |
1000 messages (1.07 MB) |
Redis Connection Pool Size |
8 |
Acks |
1 |
Compression |
none |
Linger |
0ms |
Performance Results
| Metric | Value |
|---|---|
Aggregate Throughput |
380,952 records/sec |
Throughput (MB/sec) |
372.02 MB/sec |
Total Messages |
8,000,000 |
Duration |
21 seconds |
Average Latency (range) |
249-411 ms |
95th Percentile Latency (range) |
657-2136 ms |
Redis Enterprise Metrics
| Metric | Value |
|---|---|
Total CPU (all 16 shards) |
79% |
Per-Shard CPU |
2-8% |
Data per Shard |
112-172 MB |
Data Distribution |
Even across all shards |
Key Findings
-
High throughput with low CPU usage: Achieved 372 MB/sec with only 2.35% Korvet CPU usage
-
Excellent scalability headroom: Both Korvet and Redis Enterprise operating well below capacity
-
Even load distribution: Data and CPU load distributed evenly across all 16 Redis shards
-
Optimal batch size: 1000 messages per batch provided the best balance of throughput and latency
Running This Benchmark
To reproduce this benchmark, use the provided benchmark script from the korvet-dist repository:
git clone https://github.com/redis-field-engineering/korvet-dist.git
cd korvet-dist/samples/benchmark/scripts
./run-comprehensive-benchmark.sh
The script will:
-
Start Korvet with the specified Redis pool size
-
Create a topic with 16 partitions
-
Run 8 concurrent producers, each sending 1,000,000 messages
-
Collect metrics from Korvet (via actuator) and Redis Enterprise (via API)
-
Generate a detailed report with throughput, latency, and resource usage
Results are saved to /tmp/korvet-benchmark-<timestamp>/.
Single Shard Benchmark
This benchmark demonstrates Korvet performance with a single Redis shard, providing a baseline for comparison with multi-shard configurations.
Test Environment
-
Redis Enterprise: 1 shard (~1 GB maxmemory), running locally
-
Korvet: Single instance (macOS, Apple Silicon)
-
Kafka Tools:
kafka-producer-perf-testfrom Apache Kafka -
Topic Configuration: 1 partition (matching the single shard)
-
Record Size: 1 KB (1024 bytes)
Configuration
| Parameter | Value |
|---|---|
Producers |
1 (baseline) / 8 (concurrent) |
Batch Size |
1000 messages (1.07 MB) |
Redis Connection Pool Size |
16 |
Acks |
1 |
Compression |
none |
Linger |
0ms |
Performance Results
Comparison: 16 Shards vs 1 Shard
| Metric | 16 Shards | 1 Shard | Ratio |
|---|---|---|---|
Database Memory |
~16 GB |
~1 GB |
16× |
Topic Partitions |
16 |
1 |
16× |
Throughput (rec/s) |
380,952 |
168,641 |
2.26× |
Throughput (MB/s) |
372.02 |
164.68 |
2.26× |
Per-shard throughput |
23,809 |
168,641 |
0.14× |
Key Findings
-
Single shard achieves ~44% of 16-shard aggregate throughput: 168,641 vs 380,952 records/sec
-
Higher per-shard efficiency with fewer shards: A single shard processes 168,641 rec/s vs 23,809 rec/s per shard in the 16-shard setup
-
Memory efficiency: ~1.15 KB per message in Redis Streams (776 MB for 674,564 messages)
-
Single producer baseline: 151,860 rec/s provides a clean baseline without concurrency overhead
Cold Storage Archival Benchmark
This benchmark measures the throughput of archiving messages from Redis Streams to Delta Lake on S3.
Test Environment
-
EC2 Instance: c5.2xlarge (8 vCPU, 16GB RAM) in us-west-1
-
S3 Bucket: Same region (us-west-1) for optimal network performance
-
Redis: Docker container on same instance
-
Message Size: ~100 bytes (binary payload)
-
Compression: ZSTD (Parquet default)
Single Stream Results
Archiving from a single Redis Stream to S3:
| Messages | Archive Time | Throughput | Parquet Files | Delta Commits |
|---|---|---|---|---|
1,000,000 |
31.3s |
31,970 msg/s |
100 @ 186ms avg |
11 @ 509ms avg |
Multi-Stream Results (4 Partitions)
Archiving from 4 Redis Streams in parallel to S3:
| Messages | Archive Time | Throughput | Parquet Files | Delta Commits |
|---|---|---|---|---|
1,000,000 |
12.5s |
80,239 msg/s |
100 @ 212ms avg |
16 @ 385ms avg |
4,000,000 |
34.7s |
115,347 msg/s |
400 @ 192ms avg |
44 @ 486ms avg |
Scaling Summary
| Configuration | Throughput | vs Single Stream |
|---|---|---|
1 stream |
32k msg/s |
baseline |
4 streams (1M messages) |
80k msg/s |
2.5× |
4 streams (4M messages) |
115k msg/s |
3.6× |
Key Findings
-
Single stream peaks at ~32k msg/s: Bottleneck is S3 PUT latency for Parquet files
-
Near-linear scaling with streams: 4 streams achieves 115k msg/s (3.6× single stream)
-
Parquet writes average ~190ms: Same-region S3 provides consistent low latency
-
Delta commits average ~480ms: Includes S3 metadata operations for transaction log
-
Excellent compression: ZSTD achieves ~50:1 compression ratio (~2 bytes/message stored)
-
Same-region S3 is critical: Cross-region throughput drops ~50%
Storage Efficiency
| Metric | Value |
|---|---|
Messages archived |
4,000,000 |
S3 objects created |
444 (400 Parquet + 44 Delta logs) |
Total S3 storage |
~8 MB |
Bytes per message |
~2 bytes (after ZSTD compression) |
Compression ratio |
~50:1 |
Archival Configuration
The archival service was configured with:
storage:
enabled: true
path: s3a://your-bucket/korvet
s3:
region: us-west-1
# Tuning parameters (per stream)
read-worker-count: 4 # One per stream
commit-worker-count: 4 # One per stream
redis-batch-size: 10000 # Messages per Redis XREADGROUP
max-batches-per-commit: 10
files-per-delta-commit: 10
Redis Flex (Auto-Tiering) Benchmark
This benchmark evaluates Korvet performance with Redis Flex (Auto-Tiering), which uses NVMe flash storage to extend Redis capacity beyond RAM.
Test Environment
-
Redis Enterprise: 1× i4i.xlarge (4 vCPU, 32GB RAM, 937GB NVMe)
-
Database Config: 100GB capacity, 10GB RAM (10% ratio), 8 shards
-
Korvet Client: c7i.4xlarge (16 vCPU, 32GB RAM)
-
Kafka Tools:
kafka-producer-perf-testfrom Apache Kafka -
Record Size: 1 KB (1024 bytes)
-
Region: us-west-2 (all instances in same VPC)
Test Configuration
| Parameter | Value |
|---|---|
Instance Type (Redis) |
i4i.xlarge (NVMe-backed) |
Instance Type (Client) |
c7i.4xlarge |
Shards |
8 (1.25GB RAM per shard) |
Redis Pool Size |
256 |
Producer Batch Size |
128KB ( |
Linger |
5ms ( |
Acks |
1 |
Performance Results
| Metric | Korvet → Redis Flex | Direct Redis (XADD) |
|---|---|---|
Peak Throughput |
150,784 rec/s (147 MB/s) |
130,690 rec/s (128 MB/s) |
Sustained Throughput |
110,000 rec/s (107 MB/s) |
103,000 rec/s (100 MB/s) |
Average Latency |
201 ms |
< 1 ms |
P99 Latency |
510 ms |
28 ms |
Data Structure Comparison
We compared Redis Streams (XADD) vs simple key-value (SET) operations on Redis Flex:
| Operation | Throughput | Notes |
|---|---|---|
SET (1KB values) |
153,000 ops/sec |
Simple key-value, flash-friendly |
XADD (Streams, 1KB payload) |
103,000 ops/sec |
Stream data structure overhead |
Korvet → XADD |
110-150k rec/sec |
Near-native XADD performance |
Key Findings
-
Korvet matches native Redis Streams performance: Korvet achieved 110-150k rec/sec, matching or exceeding direct XADD benchmarks
-
Flash eviction is the bottleneck for sustained writes: RAM fills faster than NVMe can drain at very high throughput
-
Larger RAM buffers help: 8 shards (1.25GB RAM/shard) outperformed 48 shards (208MB RAM/shard) by avoiding OOM errors
-
Client instance sizing matters: Upgraded from t3.medium (2 vCPU) to c7i.4xlarge (16 vCPU) to eliminate client-side bottleneck
OOM Behavior
At sustained throughput above ~150k rec/sec with 1KB payloads, Redis Flex may return OOM errors when the RAM buffer fills faster than flash eviction can drain. This is inherent to Redis Streams on flash storage, not specific to Korvet.
| Scenario | Throughput | Result |
|---|---|---|
Burst (1M records) |
150k rec/s |
✅ Success |
Sustained (2M+ records) |
150k rec/s |
⚠️ OOM after ~1.3M records |
Sustained (unlimited) |
110k rec/s |
✅ Success |
Mitigation: For sustained high-throughput workloads on Redis Flex:
-
Use fewer shards with larger RAM buffers (e.g., 8 shards vs 48)
-
Increase RAM-to-disk ratio (e.g., 15-20% instead of 10%)
-
Throttle producer throughput to ~100k rec/sec per instance
-
Use multiple Redis Flex clusters for horizontal scaling
Configuration Recommendations
For optimal throughput:
-
Batch size: Use 1000 messages per batch for best balance of throughput and latency
-
Producers: 8 concurrent producers provides excellent throughput with manageable latency
-
Redis pool size: Match pool size to number of producers (8) for optimal connection utilization
-
Partitions: Use 1-2× the number of Redis shards (16 partitions for 16 shards)
-
Redis shards: Match the number of shards to available CPU cores
-
Rebalance delay: Configure
korvet.server.rebalance-delayappropriately (default 10s) to allow all consumers to join before rebalancing -
Replication: Disable replication for write-heavy workloads (if durability requirements allow)
Running Your Own Benchmarks
Using the Benchmark Script
The korvet-dist repository contains a script to run benchmarks with various configurations.
git clone https://github.com/redis-field-engineering/korvet-dist.git
cd korvet-dist/samples/benchmark/scripts
./run-comprehensive-benchmark.sh
Configuration Options
Edit the script to customize benchmark parameters:
# Test parameters
TOPIC="benchmark-test"
PARTITIONS=16
RECORD_SIZE=1024
NUM_RECORDS=1000000
# Parameter arrays
PRODUCERS=(8) # Number of concurrent producers
BATCH_SIZES=(1000) # Messages per batch
POOL_SIZES=(8) # Redis connection pool size
What the Script Does
-
Starts Korvet with the specified Redis pool size
-
Flushes Redis to ensure clean state
-
Creates topic with specified number of partitions
-
Runs producers using
kafka-producer-perf-test -
Collects metrics:
-
Korvet CPU and memory (via Spring Boot Actuator at port 8080)
-
Redis Enterprise CPU and memory (via REST API at port 9443)
-
Producer throughput and latency
-
-
Generates report with detailed results
Output
Results are saved to /tmp/korvet-benchmark-<timestamp>/:
-
SUMMARY.txt: Summary table of all test results -
producers-<N>_batch-<B>msg_pool-<P>.txt: Detailed results for each test
Example summary output:
Producers Batch(msg) Pool Total Msgs Duration(s) Throughput(rec/s) Throughput(MB/s) 8 1000 8 8000000 21 380952 372.02
Running Cold Storage Benchmarks
To run cold storage benchmarks against S3:
# Set S3 bucket and region
export S3_BUCKET=your-bucket-name
export AWS_REGION=us-west-1
# Run single-stream benchmark (100k messages default)
./gradlew :korvet-storage:test \
--tests "StreamArchivalServiceS3Benchmark.benchmarkS3"
# Run with more messages
./gradlew :korvet-storage:test \
--tests "StreamArchivalServiceS3Benchmark.benchmarkS3" \
-Dtest.message.count=1000000
# Run multi-stream benchmark (4 streams)
./gradlew :korvet-storage:test \
--tests "StreamArchivalServiceS3Benchmark.benchmarkS3MultiStream" \
-Dtest.message.count=1000000
For best results, run on an EC2 instance in the same region as your S3 bucket.
Cold Storage Read Benchmarks (End-to-End)
The ColdStorageReadBenchmark JUnit test measures end-to-end Kafka consumer read performance when data is in cold storage (Delta Lake).
This benchmark uses real Kafka clients and the local filesystem as a storage backend, which exercises the same Delta Lake code paths as S3 without requiring cloud infrastructure.
Test Scenarios
The benchmark tests four scenarios:
-
Standalone Consumer (
assign/seek) - Direct partition assignment without group coordination -
Consumer Group (
subscribe) - Full consumer group protocol with coordination overhead -
ListOffsets API - Earliest/latest offset lookups from cold storage
-
OffsetsForTimes API - Timestamp-based offset lookups from cold storage
Environment Variables
| Variable | Default | Description |
|---|---|---|
|
|
Number of messages to produce and archive |
|
|
Record size in bytes |
|
|
Number of iterations per scenario |
|
|
Version string included in results (for tracking across releases) |
Running the Benchmark
# Basic usage
./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"
# With custom configuration
BENCHMARK_MESSAGE_COUNT=50000 BENCHMARK_ITERATIONS=5 KORVET_VERSION=0.5.0-ea1 \
./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"
CI/CD Integration (GitHub Actions)
jobs:
cold-storage-benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK
uses: actions/setup-java@v4
with:
java-version: '25'
distribution: 'temurin'
- name: Run Cold Storage Benchmark
env:
KORVET_VERSION: ${{ github.ref_name }}
run: ./gradlew :korvet-app:test --tests "ColdStorageReadBenchmark"
- name: Upload Benchmark Results
uses: actions/upload-artifact@v4
with:
name: benchmark-results
path: korvet-app/build/benchmark-results/*.json
Output Format
Results are written to korvet-app/build/benchmark-results/cold-storage-read-<timestamp>.json:
{
"korvetVersion": "0.5.0-ea1",
"timestamp": "2026-03-30T17:30:00Z",
"gitCommit": "abc1234",
"environment": {
"javaVersion": "25",
"osName": "Linux",
"availableProcessors": 4,
"maxMemoryMb": 4096
},
"config": {
"messageCount": 10000,
"recordSizeBytes": 1024,
"partitions": 1,
"iterations": 3,
"batchSizes": [100, 500, 1000]
},
"scenarios": [
{
"type": "STANDALONE_CONSUMER",
"batchSize": 100,
"avgLatencyMs": 250,
"p95LatencyMs": 320,
"throughputMsgPerSec": 400.0
}
]
}
Comparing Across Releases
The JSON output includes version and environment information, making it easy to compare results:
# Download artifacts from different releases
gh run download <run-id-1> -n benchmark-results -D results/v0.5.0-ea1
gh run download <run-id-2> -n benchmark-results -D results/v0.5.0-ea2
# Compare using jq
jq -s '.[0].scenarios[0] as $old | .[1].scenarios[0] as $new |
{scenario: $new.type, improvement: (($old.avgLatencyMs - $new.avgLatencyMs) / $old.avgLatencyMs * 100)}' \
results/v0.5.0-ea1/*.json results/v0.5.0-ea2/*.json
Interpreting Results
-
Standalone vs Consumer Group overhead: Consumer group reads include additional coordination (JoinGroup, SyncGroup, OffsetFetch) which adds latency
-
ListOffsets Earliest latency: High values indicate slow cold tier metadata lookups (Delta Lake checkpoint/commit scanning)
-
Throughput scaling: If throughput doesn’t scale linearly with batch size, there may be per-request overhead dominating
-
P95 vs Avg: Large gaps indicate Delta Lake compaction or GC pauses
| This benchmark uses local filesystem storage, so absolute latency numbers will be lower than production S3 deployments. Focus on relative changes between releases rather than absolute values. |
Timeout Recommendations
Based on production experience with S3-backed cold storage, configure consumer timeouts appropriately:
// For cold storage reads, increase timeouts
props.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, "60000"); // 60s
props.put(ConsumerConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, "120000"); // 2min
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "30000"); // 30s