Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Note

If you are aware of an alert and know that the resolution will take several days or weeks to resolve, you can silence alerts via the alert manager GUI on port 9093

...

General Advice around defining new alerts

  • Pages should be urgent, important, actionable, and real. 

  • They should represent either ongoing or imminent problems with your service. 

  • Err on the side of removing noisy alerts – over-monitoring is a harder problem to solve than under-monitoring. 

  • You should almost always be able to classify the problem into one of: availability & basic functionality; latency; correctness (completeness, freshness and durability of data); and feature-specific problems. 

  • Symptoms are a better way to capture more problems more comprehensively and robustly with less effort. 

...