Alerts

How Lookout evaluates metrics, fires alerts, and sends recoveries.

How alerting works

Every collection cycle produces a set of metric samples. The evaluator checks each sample against the rules in your alerts config and tracks each rule's state over time, so it knows whether an alert is already firing.

There are two kinds of rule:

  • Threshold checksmemory, disk, cpu, swap, load. They compare a number against a threshold, clear at resolve_below, and only fire once the value has stayed over the line for for.
  • Service & presence checkssystemd, http, tcp, process. They fire when something isn't healthy (a service is inactive, a URL returns the wrong status, a port won't connect, a process is missing) and clear when it recovers.

Alert lifecycle

Alert lifecycle — firing path and recovery path

  • You are notified once when a problem starts — after it has persisted for for.
  • If it is still firing after renotify_after, you get a reminder.
  • A recovery is sent when it clears, so you always know when things are back to normal.
  • If a metric stops reporting for stale_after, Lookout flags it as stale (default: three times the collection interval).

State is persisted to /var/lib/lookout/state.json, so a service restart doesn't replay old alerts or lose track of what's firing.

Severity

Each check has a severity of warning or critical. It sets the label, emoji, and color of the alert (and maps to the PagerDuty severity).

StateLabelColor
Firing, warning🟡 WARNINGamber
Firing, critical🔴 CRITICALred
Resolved✅ RESOLVEDgreen

Threshold checks default to warning; service, presence, and Docker checks default to critical.

Built-in threshold rules

RuleMetricDefault thresholdMessage
Memorymemory.used_percent85%High memory usage
Diskdisk.<mount>.used_percent85%High disk usage
Disk growthdisk.<mount>.fills_within_windowwithin 4hDisk predicted to fill soon
CPUcpu.used_percent85%High CPU usage
Swapswap.used_percent80%High swap usage
Loadload.1_per_core1.5High 1-minute load per core

Message format

Every notification — regardless of channel — renders from the same fields:

🔴 CRITICAL — High memory usage
Host: my-server
Metric: memory.used_percent
Value: 91.43% (threshold 85.00%)

When the value drops back below resolve_below, Lookout sends the recovery:

✅ RESOLVED — High memory usage
Host: my-server
Metric: memory.used_percent
Value: 74.21% (threshold 85.00%)

Tuning

Adjust thresholds and timing in /etc/lookout/config.yaml, then restart:

alerts:
  renotify_after: 30m   # re-alert every 30 minutes while still firing
  memory:
    threshold: 80       # alert at 80% memory
    resolve_below: 75
    for: 0s             # alert immediately instead of waiting
sudo systemctl restart lookout

See Configuration for every field.