This version is still in development and is not considered stable yet. For the latest stable version, please use Korvet 0.12.5!

Monitoring

Korvet provides comprehensive monitoring through Spring Boot Actuator and Micrometer.

Health Checks

Korvet exposes health check endpoints:

# Overall health
curl http://localhost:8080/actuator/health

# Liveness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/liveness

# Readiness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/readiness

Metrics

Metrics are exposed in Prometheus format:

curl http://localhost:8080/actuator/prometheus

Available Metrics

Korvet Custom Metrics

korvet.produce.messages: Number of messages produced (tags: topic, partition)
korvet.produce.latency: Produce request latency histogram (tags: topic, partition)
korvet.fetch.messages: Number of messages fetched (tags: topic, partition)
korvet.fetch.latency: Fetch request latency histogram (tags: topic, partition)
korvet.broker.requests: Kafka API request counts (tags: api_key, result)
korvet.storage.redis.operations: Hot Redis operation counts (tags: operation, result)
korvet.storage.delta.read.time: Cold Delta read latency (tags: stream, operation, result)
korvet.storage.tiered.read_path: Tiered read-path decisions (tags: operation, tier, result)
korvet.runtime.service.running: Broker and archiver runtime state (tag: module)
korvet.errors: Number of errors (tags: operation, error_type)

See Metrics Reference for the full module-by-module catalog.

JVM and System Metrics

Standard JVM and system metrics from Micrometer:

jvm.memory.used: JVM memory used
jvm.memory.max: JVM maximum memory
jvm.gc.pause: Garbage collection pause time
jvm.threads.live: Live threads
process.cpu.usage: Process CPU usage
system.cpu.usage: System CPU usage
system.load.average.1m: System load average

Prometheus Configuration

Add Korvet to your Prometheus scrape config:

scrape_configs:
  - job_name: 'korvet'
    static_configs:
      - targets: ['korvet:8080']
    metrics_path: '/actuator/prometheus'

Monitoring Stack Setup

A complete monitoring stack with Prometheus and Grafana is available in the korvet-dist observability directory.

Quick Start with Docker Compose

The easiest way to set up monitoring is to include the observability stack in your docker-compose.yml:

include:
  - path/to/korvet-dist/observability/docker-compose.yml

services:
  redis:
    image: redis:8.6
    ports:
      - "6379:6379"

  korvet:
    image: redisfield/korvet:latest
    ports:
      - "9092:9092"
      - "8080:8080"
    environment:
      - KORVET_REDIS_HOST=redis
      - KORVET_REDIS_METRICS_ENABLED=true
    depends_on:
      - redis

This automatically adds:

Prometheus on port 9090 - Metrics collection and storage
Grafana on port 3000 - Pre-configured dashboard for Korvet metrics

Accessing the Dashboard

Start your services:
```
docker compose up -d
```
Open Grafana at http://localhost:3000
The Korvet dashboard loads automatically (no login required)

Dashboard Features

The pre-built Grafana dashboard visualizes:

Message Rates: Real-time produce/fetch rates using irate() for instant metrics
Latency Percentiles: P50, P95, P99 for produce and fetch operations
Throughput: Ingress and egress bytes/sec
Redis Metrics: Command rates, latency percentiles (when korvet.redis.metrics.enabled=true)
JVM Metrics: Heap memory, GC pauses, thread count
System Metrics: CPU usage, load average, disk space

The dashboard defaults to a 5-minute time range with 5-second auto-refresh for real-time monitoring.

Standalone Setup

For production deployments, see the observability README for:

Prometheus configuration examples
Grafana datasource and dashboard provisioning
Customizing the dashboard

Alerting

Set up alerts for:

Korvet-Specific Alerts

High produce latency: p99 > 100ms
High fetch latency: p99 > 50ms
Broker request error spike: error results or korvet.broker.request.failures > 1% of requests
Redis pool contention: sustained korvet.storage.redis.pool.pending growth or pool timeout rate
Archiver handoff/ack issues: non-zero korvet.storage.tiered.archiver.ack_mismatch
No produce activity: No messages produced in 5 minutes (if expected)