For the latest stable version, please use Korvet 0.12.5!

Monitoring

Korvet provides comprehensive monitoring through Spring Boot Actuator and Micrometer.

Health Checks

Korvet exposes health check endpoints:

# Overall health
curl http://localhost:8080/actuator/health

# Liveness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/liveness

# Readiness probe (for Kubernetes)
curl http://localhost:8080/actuator/health/readiness

Metrics

Metrics are exposed in Prometheus format:

curl http://localhost:8080/actuator/prometheus

Available Metrics

Korvet Custom Metrics

  • korvet.produce.messages: Number of messages produced (tags: topic, partition)

  • korvet.produce.latency: Produce request latency histogram (tags: topic, partition)

  • korvet.fetch.requests: Number of fetch requests (tags: topic, partition)

  • korvet.fetch.latency: Fetch request latency histogram (tags: topic, partition)

  • korvet.fetch.messages: Number of messages fetched (tags: topic, partition, tier)

  • korvet.errors: Number of errors (tags: operation, error_type)

JVM and System Metrics

Standard JVM and system metrics from Micrometer:

  • jvm.memory.used: JVM memory used

  • jvm.memory.max: JVM maximum memory

  • jvm.gc.pause: Garbage collection pause time

  • jvm.threads.live: Live threads

  • process.cpu.usage: Process CPU usage

  • system.cpu.usage: System CPU usage

  • system.load.average.1m: System load average

Prometheus Configuration

Add Korvet to your Prometheus scrape config:

scrape_configs:
  - job_name: 'korvet'
    static_configs:
      - targets: ['korvet:8080']
    metrics_path: '/actuator/prometheus'

Grafana Dashboards

A complete monitoring stack with Prometheus and Grafana is available in the grafana/ directory. See grafana/README.adoc for quick start instructions.

The pre-built dashboard visualizes:

  • Kafka Operations: Produce/fetch rates, latency percentiles (p50, p95, p99)

  • Storage Tiers: Messages fetched by tier (hot/cold/warm)

  • Errors: Error rates by operation and type

  • Resources: CPU usage, memory usage

  • JVM: Heap memory, GC pauses, thread count

  • System: Load average, disk space

Alerting

Set up alerts for:

Korvet-Specific Alerts

  • High produce latency: p99 > 100ms

  • High fetch latency: p99 > 50ms

  • Error rate spike: Error rate > 1% of requests

  • No produce activity: No messages produced in 5 minutes (if expected)

System Alerts

  • Memory pressure: JVM heap > 80%

  • High GC activity: Frequent or long GC pauses

  • High CPU usage: Process CPU > 80%

Logging

See Logging for log-based monitoring.