Alerting
This guide assumes that you have a basic understanding of the Prometheus operator, and that you have already followed the Getting Started guide.
The Prometheus Operator introduces an Alertmanager
resource, which allows
users to declaratively describe an Alertmanager cluster. To successfully deploy
an Alertmanager cluster, it is important to understand the contract between
Prometheus and Alertmanager. Alertmanager is used to:
- Deduplicate alerts received from Prometheus.
- Silence alerts.
- Route and send grouped notifications to various integrations (PagerDuty, OpsGenie, mail, chat, …).
The Prometheus Operator also introduces an AlertmanagerConfig
resource, which
allows users to declaratively describe Alertmanager configurations.
Note: The AlertmanagerConfig resource is currently v1alpha1, testing and feedback are welcome.
Prometheus’ configuration also includes “rule files”, which contain the alerting rules. When an alerting rule triggers, it fires that alert against all Alertmanager instances, on every rule evaluation interval. The Alertmanager instances communicate to each other which notifications have already been sent out. For more information on this system design, see the High Availability page.
Pre-requisites
You have a running Prometheus operator.
Deploying Alertmanager
First, let’s create a Alertmanager cluster with three replicas:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3
Wait for all Alertmanager pods to be ready:
kubectl get pods -l alertmanager=example -w
Managing Alertmanager configuration
By default, the Alertmanager instances will start with a minimal configuration which isn’t really useful since it doesn’t send any notification when receiving alerts.
You have several options to provide the Alertmanager configuration:
- You can use a native Alertmanager configuration file stored in a Kubernetes secret.
- You can use
spec.alertmanagerConfiguration
to reference an AlertmanagerConfig object in the same namespace which defines the main Alertmanager configuration. - You can define
spec.alertmanagerConfigSelector
andspec.alertmanagerConfigNamespaceSelector
to tell the operator which AlertmanagerConfigs objects should be selected and merged with the main Alertmanager configuration.
Using a Kubernetes Secret
The following native Alertmanager configuration sends notifications to a fictuous webhook service:
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://example.com/'
Save the above configuration in a file called alertmanager.yaml
in the local directory and create a Secret from it:
kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml
The Prometheus operator requires the Secret to be named like
alertmanager-{ALERTMANAGER_NAME}
. In the previous example, the name of the
Alertmanager is example
, so the secret name must be alertmanager-example
.
The name of the key holding the configuration data in the Secret has to be
alertmanager.yaml
.
Note: if you want to use a different secret name, you can specify it with the
spec.configSecret
field in the Alertmanager resource.
The Alertmanager configuration may reference custom templates or password files
on disk. These can be added to the Secret along with the alertmanager.yaml
configuration file. For example, provided that we have the following Secret:
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-example
data:
alertmanager.yaml: {BASE64_CONFIG}
template_1.tmpl: {BASE64_TEMPLATE_1}
template_2.tmpl: {BASE64_TEMPLATE_2}
Templates will be accessible to the Alertmanager container under the
/etc/alertmanager/config
directory. The Alertmanager
configuration can reference them like this:
templates:
- '/etc/alertmanager/config/*.tmpl'
Using AlertmanagerConfig Resources
The following example configuration creates an AlertmanagerConfig resource that sends notifications to a fictitious webhook service.
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: config-example
labels:
alertmanagerConfig: example
spec:
route:
groupBy: ['job']
groupWait: 30s
groupInterval: 5m
repeatInterval: 12h
receiver: 'webhook'
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://example.com/'
Create the AlertmanagerConfig resource in your cluster:
curl -sL https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/example/user-guides/alerting/alertmanager-config-example.yaml | kubectl create -f -
The spec.alertmanagerConfigSelector
field in the Alertmanager resource
needs to be updated so the operator selects AlertmanagerConfig resources. In
the previous example, the label alertmanagerConfig: example
is added, so the
Alertmanager object should be updated like this:
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
spec:
replicas: 3
alertmanagerConfigSelector:
matchLabels:
alertmanagerConfig: example
Using AlertmanagerConfig for global configuration
The following example configuration creates an Alertmanager resource that uses
an AlertmanagerConfig resource to be used for the Alertmanager configuration
instead of the alertmanager-example
secret.
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: example
namespace: default
spec:
replicas: 3
alertmanagerConfiguration:
name: config-example
The AlertmanagerConfig resource named example-config
in namespace default
will be a global AlertmanagerConfig. When the operator generates the
Alertmanager configuration from it, the namespace label will not be enforced
for routes and inhibition rules.
Exposing the Alertmanager service
To access the Alertmanager interface, you have to expose the service to the outside. For
simplicity, we use a NodePort
Service.
apiVersion: v1
kind: Service
metadata:
name: alertmanager-example
spec:
type: NodePort
ports:
- name: web
nodePort: 30903
port: 9093
protocol: TCP
targetPort: web
selector:
alertmanager: example
Once the Service is created, the Alertmanager web server is available under the
node’s IP address on port 30903
.
Note: Exposing the Alertmanager web server this way may not be an applicable solution. Read more about the possible options in the Ingress guide.
Integrating with Prometheus
Configuring Alertmanager in Prometheus
This Alertmanager cluster is now fully functional and highly available, but no alerts are fired against it.
First, create a Prometheus instance that will send alerts to the Alertmanger cluster:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: example
spec:
serviceAccountName: prometheus
replicas: 2
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
The Prometheus
resource discovers all of the Alertmanager instances behind
the Service
created before (pay attention to name
, namespace
and port
fields which should match with the definition of the Alertmanager Service).
Open the Prometheus web interface, go to the “Status > Runtime & Build Information” page and check that the Prometheus has discovered 3 Alertmanager instances.
Deploying Prometheus Rules
The PrometheusRule
CRD allows to define alerting and recording rules. The
operator knows which PrometheusRule objects to select for a given Prometheus
based on the spec.ruleSelector
field.
Note: by default,
spec.ruleSelector
is nil meaning that the operator picks up no rule.
By default, the Prometheus resources discovers only PrometheusRule
resources
in the same namespace. This can be refined with the ruleNamespaceSelector
field:
- To discover rules from all namespaces, pass an empty dict (
ruleNamespaceSelector: {}
). - To discover rules from all namespaces matching a certain label, use the
matchLabels
field.
Discover PrometheusRule
resources with role=alert-rules
and
prometheus=example
labels from all namespaces with team=frontend
label:
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: example
spec:
serviceAccountName: prometheus
replicas: 2
alerting:
alertmanagers:
- namespace: default
name: alertmanager-example
port: web
serviceMonitorSelector:
matchLabels:
team: frontend
ruleSelector:
matchLabels:
role: alert-rules
prometheus: example
ruleNamespaceSelector:
matchLabels:
team: frontend
In case you want to select individual namespace by their name, you can use the
kubernetes.io/metadata.name
label, which gets populated automatically with
the
NamespaceDefaultLabelName
feature gate.
Create a PrometheusRule object from the following manifest. Note that the
object’s labels match with the spec.ruleSelector
of the Prometheus object.
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: null
labels:
prometheus: example
role: alert-rules
name: prometheus-example-rules
spec:
groups:
- name: ./example.rules
rules:
- alert: ExampleAlert
expr: vector(1)
For demonstration purposes, the PrometheusRule object always fires the
ExampleAlert
alert. To validate that everything is working properly, you can
open again the Prometheus web interface and go to the Alerts page.
Next open the Alertmanager web interface and check that it shows one active alert.