Prometheus Scalability: High Cardinality and How to Fix It

Question

Prometheus Scalability: High Cardinality and How to Fix It

calendar_todayApr 24 • schedule11 min read

— Originally published at alexandre-vazquez.com

Prometheus has become the de facto standard for metrics collection in cloud-native environments. Its pull-based model, powerful query language, and deep Kubernetes integration make it an obvious choice for platform teams. But as organizations scale — more services, more replicas, more labels — Prometheus starts showing cracks. Queries slow down, memory usage balloons, and what was once a reliable monitoring backbone becomes an operational liability. This article examines exactly why that happens and what you can do about it, from quick tactical fixes to full architectural overhauls.

The Cardinality Problem: Why It Kills Prometheus

Cardinality is the single most important concept to understand when troubleshooting Prometheus scalability. In the context of time series databases, cardinality refers to the total number of unique label combinations that exist across all your metrics. Every unique combination creates a distinct time series, and Prometheus must store, index, and query each of them independently.

Consider a simple HTTP request counter: http_requests_total. If you label it with method (GET, POST, PUT, DELETE), status_code (200, 201, 400, 404, 500, 503), and endpoint (50 distinct API paths), you already have 4 × 6 × 50 = 1,200 time series from a single metric. Now add a customer_id label with 10,000 distinct values. You have just created 12 million time series from one counter.

This is the cardinality explosion pattern, and it is the most common cause of Prometheus degradation in production. The problem is compounded by labels that have unbounded or high-entropy values:

User IDs or session tokens embedded in labels
Request IDs or trace IDs (effectively infinite cardinality)
Pod names without proper aggregation, especially in autoscaling environments
Free-form error messages or SQL query strings
IP addresses, particularly in environments with high churn

The relationship between cardinality and resource consumption is not linear — it is roughly proportional but carries significant overhead per series in memory indexing structures. Prometheus stores its head block (the most recent data) entirely in memory. Each time series in the head block requires approximately 3–4 KB of RAM for the series itself plus index entries. A Prometheus instance with 1 million active time series will typically consume 4–6 GB of RAM just for the head block, before accounting for query processing overhead.

Memory Explosion Patterns and Real Symptoms

Memory issues in Prometheus rarely announce themselves cleanly. Instead, they manifest through a cascade of symptoms that are easy to misdiagnose. Understanding the failure modes helps you identify the root cause faster and apply the right remedy.

The Head Block Growth Pattern

Prometheus keeps a two-hour window of data in memory as the head block before compacting it to disk. If your series count grows continuously — which happens when pod churn creates new series faster than old ones expire — the head block never shrinks. You can monitor this directly with prometheus_tsdb_head_series and prometheus_tsdb_head_chunks. A healthy instance shows this number plateauing. A cardinality problem shows it growing monotonically until OOM.

Query Timeout Cascades

As series count grows, even well-written PromQL queries that worked fine at 100k series become unbearably slow at 1M. Grafana dashboards start timing out, alert evaluation lags behind schedule, and Alertmanager begins receiving delayed or duplicated firing alerts. The prometheus_rule_evaluation_duration_seconds metric is a reliable early warning — when p99 evaluation time for your recording rules exceeds your evaluation interval, you have a problem.

Scrape Failures Under Memory Pressure

When Prometheus is under heavy memory pressure, its Go garbage collector starts spending more time collecting, which introduces latency into the scrape loop. Scrapes begin timing out, causing gaps in your data. This creates a deceptive situation where you have gaps in metrics precisely when your system is under stress — exactly when you need monitoring most. Watch up metric drops and prometheus_target_scrapes_exceeded_sample_limit_total for these patterns.

Compaction Pressure

High cardinality also stresses the TSDB compaction process. Prometheus compacts head block data into persistent blocks every two hours. With millions of series, compaction can take tens of seconds to minutes, during which write performance degrades. prometheus_tsdb_compaction_duration_seconds rising above 30 seconds is a warning sign. Compaction failures leave orphaned blocks on disk, gradually consuming storage and potentially corrupting the TSDB if left unaddressed.

Short-Term Fixes: Tactical Remediation

When you are dealing with a Prometheus instance under active stress, you need immediate relief before you can implement architectural changes. These techniques can be applied quickly and provide meaningful headroom while longer-term solutions are planned.

Recording Rules: Pre-Computing Aggregations

Recording rules are the most underutilized tool in the Prometheus toolbox. They allow you to pre-compute expensive PromQL expressions and store the results as new time series. The key benefit for scalability is that you can aggregate away high-cardinality dimensions, dramatically reducing the number of series that dashboards and alerts need to query at runtime.

Consider an example where you have per-pod HTTP request rates with labels for pod, namespace, service, method, and status_code. Your dashboards mostly need service-level aggregations, not per-pod breakdowns. A recording rule can produce that aggregation once per evaluation interval:

groups:
  - name: http_aggregations
    interval: 30s
    rules:
      - record: job:http_requests_total:rate5m
        expr: |
          sum by (job, namespace, method, status_code) (
            rate(http_requests_total[5m])
          )

      - record: job:http_request_duration_seconds:p99_5m
        expr: |
          histogram_quantile(0.99,
            sum by (job, namespace, le) (
              rate(http_request_duration_seconds_bucket[5m])
            )
          )

      - record: namespace:http_requests_total:rate5m
        expr: |
          sum by (namespace, status_code) (
            rate(http_requests_total[5m])
          )

Notice that the pod label is dropped in all three rules. If you had 500 pods, you have just reduced the cardinality of these series by a factor of 500. Dashboards querying job:http_requests_total:rate5m instead of computing rate(http_requests_total[5m]) on the fly will return results orders of magnitude faster.

The naming convention level:metric:operations is the Prometheus community standard. Following it consistently makes recording rules self-documenting and helps teams understand the aggregation level at a glance.

Metric Dropping via Relabeling

Relabeling gives you surgical control over what metrics Prometheus actually ingests. There are two stages where relabeling applies: relabel_configs (applied before scraping, based on target metadata) and metric_relabel_configs (applied after scraping, based on scraped metric names and labels). For cardinality control, metric_relabel_configs is your primary tool.

Dropping entire metric families that you do not use is the most impactful change you can make. Many exporters emit dozens of metrics that are irrelevant for most use cases:

scrape_configs:
  - job_name: kubernetes-pods
    metric_relabel_configs:
      # Drop metrics we never query
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*|process_.*'
        action: drop

      # Drop high-cardinality label values while keeping the metric
      - source_labels: [__name__, pod]
        regex: 'http_requests_total;.*'
        target_label: pod
        replacement: ''

      # Drop entire time series based on label combinations
      - source_labels: [__name__, le]
        regex: 'http_request_duration_seconds_bucket;(\+Inf|100|250|500)'
        action: keep

      # Replace high-cardinality endpoint paths with normalized versions
      - source_labels: [endpoint]
        regex: '/api/v1/users/[0-9]+'
        target_label: endpoint
        replacement: '/api/v1/users/:id'

Be careful with metric_relabel_configs — they are applied per scraped sample, so computationally expensive regex patterns across high-frequency scrapes can add CPU overhead. Test regex patterns and prefer anchored, non-backtracking expressions.

Cardinality Limits as a Safety Net

Prometheus 2.x introduced per-scrape sample limits as a defensive mechanism. These do not solve cardinality problems but prevent a single misbehaving exporter from taking down your entire Prometheus instance:

global:
  # Global limit across all scrapes
  sample_limit: 0  # 0 = no limit

scrape_configs:
  - job_name: application-pods
    # Reject scrapes that return more than 50k samples
    sample_limit: 50000

    # Limit unique label sets per scrape
    label_limit: 64

    # Limit label name and value lengths
    label_name_length_limit: 256
    label_value_length_limit: 1024

    kubernetes_sd_configs:
      - role: pod

When a scrape exceeds sample_limit, Prometheus rejects the entire scrape and marks the target as having failed. This is a hard circuit breaker, not a graceful degradation — the target’s up metric goes to 0. Set limits conservatively above your expected maximum to avoid false positives, and alert on prometheus_target_scrapes_exceeded_sample_limit_total > 0.

Architectural Solutions: Federation and Remote Write

Once you have exhausted tactical optimizations or when your scale genuinely exceeds what a single Prometheus instance can handle, architectural changes become necessary. Prometheus offers two built-in mechanisms for scaling horizontally: federation and remote_write.

Federation: Hierarchical Scraping

Prometheus federation allows one Prometheus instance to scrape aggregated metrics from other Prometheus instances via the /federate endpoint. In a typical setup, leaf-level Prometheus instances collect raw metrics from targets, while a global Prometheus instance federates pre-aggregated recording rule results from the leaves.

# Global Prometheus configuration federating from regional instances
scrape_configs:
  - job_name: federate-eu-west
    scrape_interval: 15s
    honor_labels: true
    metrics_path: /federate
    params:
      match[]:
        # Only federate pre-aggregated recording rule metrics
        - '{__name__=~"job:.*"}'
        - '{__name__=~"namespace:.*"}'
        - '{__name__=~"cluster:.*"}'
        # Federate key infrastructure alerts
        - 'up{job="kubernetes-apiservers"}'
    static_configs:
      - targets:
          - prometheus-eu-west.monitoring.svc:9090
          - prometheus-us-east.monitoring.svc:9090
          - prometheus-ap-south.monitoring.svc:9090

Federation works well for multi-region global dashboards and cross-cluster alerting on aggregated signals. Its limitations are significant, though: the /federate endpoint is a point-in-time snapshot, so you cannot run range queries against federated data effectively. It also creates a single point of failure at the global layer and does not provide true long-term storage. For those requirements, remote_write is the better path.

Remote Write: Streaming to Durable Storage

Remote write allows Prometheus to stream all ingested samples to an external storage backend in real time. The external backend handles long-term retention, multi-tenancy, and global query federation. Prometheus itself becomes a stateless collection agent that maintains only a short local retention window for resilience against network outages.

remote_write:
  - url: https://thanos-receive.monitoring.svc:19291/api/v1/receive
    # Authentication for the remote endpoint
    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/secrets/remote-write-password

    # Tune the write queue for throughput vs. latency
    queue_config:
      # Number of shards (parallel write connections)
      max_shards: 200
      min_shards: 1
      # Samples to batch before flushing
      max_samples_per_send: 500
      # Time to wait before flushing an incomplete batch
      batch_send_deadline: 5s
      # In-memory buffer capacity per shard
      capacity: 2500
      # How long to retry failed writes
      min_backoff: 30ms
      max_backoff: 5s

    # Metadata configuration
    metadata_config:
      send: true
      send_interval: 1m

    # Filter what gets remote-written (reduce egress)
    write_relabel_configs:
      - source_labels: [__name__]
        regex: 'go_gc_.*|go_memstats_.*'
        action: drop

The queue_config tuning is critical and frequently misunderstood. Each shard maintains its own connection to the remote endpoint and its own in-memory queue. Increasing max_shards increases parallelism and throughput but also increases memory consumption and load on the remote endpoint. The right values depend heavily on your sample ingestion rate and network latency to the remote endpoint. Monitor prometheus_remote_storage_queue_highest_sent_timestamp_seconds versus prometheus_remote_storage_highest_timestamp_in_seconds — the lag between them tells you how far behind your remote write queue is.

Long-Term Solutions: Thanos vs Grafana Mimir vs VictoriaMetrics

For production systems that need long-term storage, global query capability, high availability, and genuine horizontal scalability, purpose-built solutions are the right answer. Three projects dominate this space: Thanos, Grafana Mimir, and VictoriaMetrics. They share similar goals but differ significantly in architecture, operational complexity, and trade-offs.

Criterion	Thanos	Grafana Mimir	VictoriaMetrics
Architecture	Sidecar + object store; modular components	Fully distributed; Cortex-derived microservices	Single binary or cluster mode
Storage backend	Any S3-compatible object store	Any S3-compatible object store	Own TSDB format on local or object store
PromQL compatibility	Full PromQL; own query engine	Full PromQL; Mimir-specific extensions	MetricsQL (PromQL superset)
Operational complexity	Medium — multiple components, each simple	High — many microservices with complex config	Low — minimal components, simple config
Ingest scalability	Scales via Thanos Receive fan-out	Horizontally scalable distributors + ingesters	Excellent; handles millions of samples/sec per node
Query performance	Good; Store Gateway caches object store data	Good; query sharding and caching built in	Excellent; highly optimized query engine
Multi-tenancy	Limited; tenant isolation via external labels	Native; per-tenant limits and isolation	Enterprise only; basic in cluster mode
Deduplication	Built-in; replica dedup at query time	Built-in; ingest-time and query-time dedup	Built-in; dedup with downsampling
Downsampling	Yes; Thanos Compactor handles it	Yes; configurable per tenant	Yes; automatic with vmbackupmanager
License	Apache 2.0 (fully open source)	AGPL-3.0 (open source) + enterprise tier	Apache 2.0 (community); proprietary enterprise
Best fit	Teams already running Prometheus wanting minimal disruption	Large orgs needing multi-tenant SaaS-grade monitoring	Teams prioritizing simplicity and raw performance

Thanos: The Incremental Path

Thanos integrates with existing Prometheus deployments through a sidecar process that runs alongside each Prometheus pod. The sidecar uploads completed TSDB blocks to object storage (S3, GCS, Azure Blob) and exposes a gRPC Store API that Thanos Query uses to federate queries across all Prometheus instances plus historical data in the object store. This makes Thanos the lowest-friction path for teams with existing Prometheus infrastructure.

Thanos Receive is an alternative ingest path that accepts remote_write directly, which is useful when you want to decouple Prometheus instances from the query layer or implement active-active HA without relying on Prometheus replication. Thanos Compactor handles block compaction and downsampling on the object store, creating 5-minute and 1-hour resolution downsamples automatically for efficient long-range queries.

Grafana Mimir: Enterprise-Grade Multi-Tenancy

Mimir is a fork of Cortex, rewritten by Grafana Labs to address operational complexity issues in Cortex’s architecture. It follows the same microservices pattern — Distributor, Ingester, Querier, Query Frontend, Store Gateway, Compactor, Ruler — but with significantly improved defaults and a monolithic deployment mode that simplifies small-scale deployments. Mimir’s headline feature is native multi-tenan

🔥 Join developers growing publicly

Share your knowledge, build in public, and grow your developer presence with a global community.

Join CoderLegion

chevron_left

Alexandre Vazquez

1.7k Points • 47 Badges

Madrid • alexandre-vazquez.com

24Posts

2Comments

7Connections

Software engineer focused on **cloud-native architectures, DevOps, and automation**.

I work hands-o... Show more

Commenters (This Week)

Contribute meaningful comments to climb the leaderboard and earn badges!

	I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt Karol Modelskiverified - Mar 19
	How Much Does a Custom Website Cost in 2026? Pricing Guide Built2Win - Jul 2
	What Is SARIF and How Does It Help Security Tools Work Together? Ganesh Kumar - Jul 4
	Prometheus 3.0 and OpenTelemetry: Native OTLP Support Explained Alexandre Vazquez - Apr 27
	How Long Does It Take to Build a Custom Website? 2026 Timeline Built2Win - Jun 29

Prometheus Scalability: High Cardinality and How to Fix It

The Cardinality Problem: Why It Kills Prometheus

Memory Explosion Patterns and Real Symptoms

The Head Block Growth Pattern

Query Timeout Cascades

Scrape Failures Under Memory Pressure

Compaction Pressure

Short-Term Fixes: Tactical Remediation

Recording Rules: Pre-Computing Aggregations

Metric Dropping via Relabeling

Cardinality Limits as a Safety Net

Architectural Solutions: Federation and Remote Write

Federation: Hierarchical Scraping

Remote Write: Streaming to Durable Storage

Long-Term Solutions: Thanos vs Grafana Mimir vs VictoriaMetrics

Thanos: The Incremental Path

Grafana Mimir: Enterprise-Grade Multi-Tenancy

0 Comments

Please log in to comment on this post.

More Posts

I’m a Senior Dev and I’ve Forgotten How to Think Without a Prompt

How Much Does a Custom Website Cost in 2026? Pricing Guide

What Is SARIF and How Does It Help Security Tools Work Together?

Prometheus 3.0 and OpenTelemetry: Native OTLP Support Explained

How Long Does It Take to Build a Custom Website? 2026 Timeline

More From Alexandre Vazquez

Prometheus 3.0 and OpenTelemetry: Native OTLP Support Explained

Helm Values JSON Schema: Validate Your values.yaml Before It Breaks Production

Kubernetes Security Best Practices: A Production Hardening Guide

Related Jobs

Commenters (This Week)

Welcome to Coder Legion

Connect with 4,623 amazing developers

Don't have an account? Sign up

OR

Prometheus Scalability: High Cardinality and How to Fix It

The Cardinality Problem: Why It Kills Prometheus

Memory Explosion Patterns and Real Symptoms

The Head Block Growth Pattern

Query Timeout Cascades

Scrape Failures Under Memory Pressure

Compaction Pressure

Short-Term Fixes: Tactical Remediation

Recording Rules: Pre-Computing Aggregations

Metric Dropping via Relabeling

Cardinality Limits as a Safety Net

Architectural Solutions: Federation and Remote Write

Federation: Hierarchical Scraping

Remote Write: Streaming to Durable Storage

Long-Term Solutions: Thanos vs Grafana Mimir vs VictoriaMetrics

Thanos: The Incremental Path

Grafana Mimir: Enterprise-Grade Multi-Tenancy

0 Comments

Please log in to comment on this post.

More Posts

More From Alexandre Vazquez

Related Jobs

Commenters (This Week)