O Assassino Silencioso: Por Que 'Sem Dados' é Pior que Dados Ruins

TL;DR:* No data can be more dangerous than bad data in monitoring systems. Without explicit alerts for missing metrics, critical failures can go completely unnoticed. Prometheus’s absent() function helps you detect when vital data stops reporting — not just when thresholds are crossed.*

This article is for engineers who rely on metrics to make decisions — and assume those metrics will always be there.

In observability, most alerts are built around thresholds:

CPU too high
Memory too low
Error rates spiking

That’s where most teams spend their time.

But there’s a failure mode that’s far more insidious — and often invisible until it’s too late:

The absence of data itself.

When a critical metric stops being reported, many alerting systems simply display “No Data” and move on.

No alarm. No page. Just silence.

And in that silence:

Your application may be down
Your scrape targets may be unreachable
Your monitoring pipeline itself may be broken

Yet nothing alerts you.

Why “No Data” Is So Dangerous

Most alerts assume data exists.

They only trigger when a metric:

is present
crosses a threshold

If the metric disappears entirely, the alert never fires — because there’s nothing to evaluate.

This creates a false sense of safety:

No alert means everything is fine.

In reality, it may mean nothing is being observed at all.

That’s why “no data” is often worse than bad data. Bad data is noisy.

No data is silent.

Enter absent(): Alerting on Silence

This is where Prometheus’s absent() function becomes essential.

Unlike standard queries, absent() returns 1 when a metric does not exist at all.

It flips the alerting model:

Instead of asking “Is this value too high?”
You ask, “Is this metric even there?”

That distinction is critical for business-impacting systems.

If your payment service stops reporting transaction metrics, don’t assume everything is healthy. You want to know immediately that something has gone wrong — either with the service or its instrumentation.

Practical Patterns That Actually Work

Using absent() requires intentional design. Not every metric deserves this treatment — only the ones where silence is never acceptable.

Some common and effective patterns:

absent(up{job="payment-service"})

This alerts when Prometheus can’t scrape the service at all.

absent(transactions_total{service="checkout"}[5m])

This catches cases where the service is up, but the metric hasn’t been updated recently — often signaling:

stuck processes
failed instrumentation
partial outages

These act as heartbeat alerts: if they stop beating, something is wrong.

Monitoring the Pipeline, Not Just the Symptoms

The key insight here is simple:

Reliable monitoring isn’t just about detecting anomalies in data.

It’s about ensuring the data pipeline is operational.

Without absent(), your monitoring implicitly assumes:

metrics will always arrive
scrapes will never fail
Instrumentation will never break

Those assumptions rarely hold in real systems.

The Real Takeaway

Observability isn’t just about what you measure — it’s about knowing when measurement itself disappears.

Silence is not a neutral state.

In monitoring, silence is often the loudest warning that you’re not listening.

If you don’t alert on missing data, your monitoring is only as reliable as your assumptions — and assumptions fail faster than systems do.

Publicado originalmente no Medium.