This version is still in development and is not considered stable yet. For the latest stable version, please use Korvet 0.12.5!

Monitoring

Korvet provides comprehensive monitoring through Spring Boot Actuator and Micrometer.

Health Checks

Korvet exposes health check endpoints:

# Overall health
curl http://localhost:8080/actuator/health

# Liveness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/liveness

# Readiness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/readiness

Metrics

Metrics are exposed in Prometheus format:

curl http://localhost:8080/actuator/prometheus

Available Metrics

Korvet Custom Metrics

  • korvet.produce.messages: Number of messages produced (tags: topic, partition)

  • korvet.produce.latency: Produce request latency histogram (tags: topic, partition)

  • korvet.fetch.messages: Number of messages fetched (tags: topic, partition)

  • korvet.fetch.latency: Fetch request latency histogram (tags: topic, partition)

  • korvet.broker.requests: Kafka API request counts (tags: api_key, result)

  • korvet.storage.redis.operations: Hot Redis operation counts (tags: operation, result)

  • korvet.storage.delta.read.time: Cold Delta read latency (tags: stream, operation, result)

  • korvet.storage.tiered.read_path: Tiered read-path decisions (tags: operation, tier, result)

  • korvet.runtime.service.running: Broker and archiver runtime state (tag: module)

  • korvet.errors: Number of errors (tags: operation, error_type)

See Metrics Reference for the full module-by-module catalog.

JVM and System Metrics

Standard JVM and system metrics from Micrometer:

  • jvm.memory.used: JVM memory used

  • jvm.memory.max: JVM maximum memory

  • jvm.gc.pause: Garbage collection pause time

  • jvm.threads.live: Live threads

  • process.cpu.usage: Process CPU usage

  • system.cpu.usage: System CPU usage

  • system.load.average.1m: System load average

Prometheus Configuration

Add Korvet to your Prometheus scrape config:

scrape_configs:
  - job_name: 'korvet'
    static_configs:
      - targets: ['korvet:8080']
    metrics_path: '/actuator/prometheus'

Monitoring Stack Setup

A complete monitoring stack with Prometheus and Grafana is available in the korvet-dist observability directory.

Quick Start with Docker Compose

The easiest way to set up monitoring is to include the observability stack in your docker-compose.yml:

include:
  - path/to/korvet-dist/observability/docker-compose.yml

services:
  redis:
    image: redis:8.6
    ports:
      - "6379:6379"

  korvet:
    image: redisfield/korvet:latest
    ports:
      - "9092:9092"
      - "8080:8080"
    environment:
      - KORVET_REDIS_HOST=redis
      - KORVET_REDIS_METRICS_ENABLED=true
    depends_on:
      - redis

This automatically adds:

  • Prometheus on port 9090 - Metrics collection and storage

  • Grafana on port 3000 - Pre-configured dashboard for Korvet metrics

Accessing the Dashboard

  1. Start your services:

    docker compose up -d
  2. Open Grafana at http://localhost:3000

  3. The Korvet dashboard loads automatically (no login required)

Dashboard Features

The pre-built Grafana dashboard visualizes:

  • Message Rates: Real-time produce/fetch rates using irate() for instant metrics

  • Latency Percentiles: P50, P95, P99 for produce and fetch operations

  • Throughput: Ingress and egress bytes/sec

  • Redis Metrics: Command rates, latency percentiles (when korvet.redis.metrics.enabled=true)

  • JVM Metrics: Heap memory, GC pauses, thread count

  • System Metrics: CPU usage, load average, disk space

The dashboard defaults to a 5-minute time range with 5-second auto-refresh for real-time monitoring.

Standalone Setup

For production deployments, see the observability README for:

  • Prometheus configuration examples

  • Grafana datasource and dashboard provisioning

  • Customizing the dashboard

Alerting

Set up alerts for:

Korvet-Specific Alerts

  • High produce latency: p99 > 100ms

  • High fetch latency: p99 > 50ms

  • Broker request error spike: error results or korvet.broker.request.failures > 1% of requests

  • Redis pool contention: sustained korvet.storage.redis.pool.pending growth or pool timeout rate

  • Archiver handoff/ack issues: non-zero korvet.storage.tiered.archiver.ack_mismatch

  • No produce activity: No messages produced in 5 minutes (if expected)

System Alerts

  • Memory pressure: JVM heap > 80%

  • High GC activity: Frequent or long GC pauses

  • High CPU usage: Process CPU > 80%

Logging

See Logging for log-based monitoring.