Skip to the content.

18. Standard Metrics for Services using Prometheus

Date: 2023-12-29

Status

Accepted

Context

The drive for standardizing metrics in our services is informed by the need for clear, consistent, and actionable data that can guide operational decisions and provide insights into service performance. Prometheus, with its robust monitoring capabilities, offers a suitable platform for this endeavor. This decision also aligns with our commitment to maintaining high service availability and reliability and is influenced by our previous ADR on Prometheus Metrics Naming (ADR #23).

Decision

We have decided to adopt standard metrics for our services using Prometheus.

Enhancements to Prometheus Metric Examples

SLA/SLO/SLI Metrics

Cache Metrics

Basic Service Metrics

Resource Utilization Metrics

Additional Metrics

Reference to ADR #23: Prometheus Metrics Naming

Consistent with ADR #23, all metrics will follow the prescribed naming conventions and utilize labels for additional dimensions. This will enhance clarity, ease of understanding, and consistency in metric categorization.

Consequences

This standardization will enable systematic monitoring and improvement of service performance. However, challenges include ensuring accuracy in distributed systems and avoiding over-reliance on quantitative metrics. These will be mitigated through continuous monitoring strategy refinement and periodic metric reviews.