Authored by Kevin Lyda 💬

Using prometheus to alert on

Dump the output of this into /var/lib/node_exporter/textfile_collector/mdadm-monitor.prom and then add this to your alerting rules:

# Mdadm monitor not running.  Last TestMessage event was 1800 seconds ago.
ALERT MdadmMonitorNotRunning
  IF mdadm_monitor{event="TestMessage"} < (time() - 1800)
  FOR 80m
    summary = "Mdadm Monitor has problems on {{$labels.instance}} failed.",
    description = "Mdadm Monitor has problems on {{$labels.instance}} failed.",

# Mdadm monitor sees a problem.  An event has fired in last 1800 seconds.
ALERT MdadmMonitorErrorDetected
  IF mdadm_monitor{event!="TestMessage"} > (time() - 1800)
  FOR 30m
    summary = "Mdadm Monitor has seen an event on {{$labels.instance}}.",
    description = "Mdadm Monitor has an event ({{$labels.event}}) on {{$labels.instance}}.",
  } 284 Bytes
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment