Fri Jan 16
O Assassino Silencioso: Por Que 'Sem Dados' é Pior que Dados Ruins
Quando o monitoramento fica silencioso, não significa que está tudo bem — geralmente significa que você está cego.
TL;DR:* No data can be more dangerous than bad data in monitoring systems. Without explicit alerts for missing metrics, critical failures can go completely unnoticed. Prometheus’s absent() function helps you detect when vital data stops reporting — not just when thresholds are crossed.*
This article is for engineers who rely on metrics to make decisions — and assume those metrics will always be there.
In observability, most alerts are built around thresholds:
- CPU too high
- Memory too low
- Error rates spiking
That’s where most teams spend their time.
But there’s a failure mode that’s far more insidious — and often invisible until it’s too late:
The absence of data itself.
When a critical metric stops being reported, many alerting systems simply display “No Data” and move on.
No alarm. No page. Just silence.
And in that silence:
- Your application may be down
- Your scrape targets may be unreachable
- Your monitoring pipeline itself may be broken
Yet nothing alerts you.
Why “No Data” Is So Dangerous
Most alerts assume data exists.
They only trigger when a metric:
- is present
- crosses a threshold
If the metric disappears entirely, the alert never fires — because there’s nothing to evaluate.
This creates a false sense of safety:
No alert means everything is fine.
In reality, it may mean nothing is being observed at all.
That’s why “no data” is often worse than bad data. Bad data is noisy.
No data is silent.
Enter absent(): Alerting on Silence
This is where Prometheus’s absent() function becomes essential.
Unlike standard queries, absent() returns 1 when a metric does not exist at all.
It flips the alerting model:
- Instead of asking “Is this value too high?”
- You ask, “Is this metric even there?”
That distinction is critical for business-impacting systems.
If your payment service stops reporting transaction metrics, don’t assume everything is healthy. You want to know immediately that something has gone wrong — either with the service or its instrumentation.
Practical Patterns That Actually Work
Using absent() requires intentional design. Not every metric deserves this treatment — only the ones where silence is never acceptable.
Some common and effective patterns:
absent(up{job="payment-service"})
This alerts when Prometheus can’t scrape the service at all.
absent(transactions_total{service="checkout"}[5m])
This catches cases where the service is up, but the metric hasn’t been updated recently — often signaling:
- stuck processes
- failed instrumentation
- partial outages
These act as heartbeat alerts: if they stop beating, something is wrong.
Monitoring the Pipeline, Not Just the Symptoms
The key insight here is simple:
Reliable monitoring isn’t just about detecting anomalies in data.
It’s about ensuring the data pipeline is operational.
Without absent(), your monitoring implicitly assumes:
- metrics will always arrive
- scrapes will never fail
- Instrumentation will never break
Those assumptions rarely hold in real systems.
The Real Takeaway
Observability isn’t just about what you measure — it’s about knowing when measurement itself disappears.
Silence is not a neutral state.
In monitoring, silence is often the loudest warning that you’re not listening.
If you don’t alert on missing data, your monitoring is only as reliable as your assumptions — and assumptions fail faster than systems do.
Publicado originalmente no Medium.