Prometheusアラート設定

8195 ワード

AlertManager centos7 prometheus centos7 テキストリンク

環境

PrometheusはDockerコンテナ上で動かしています。
クラウド環境：Azure
Dockerホスト：CentOS7.3
Dockerコンテナ：（prometheusサーバ）CentOS7.3

<監視対象>
Dockerホスト：CentOS7.3
Dockerコンテナ：CentOS7.3 (Webサーバを想定してApacheを起動）

前提条件

・Prometheusサーバのインストールが完了していること
PrometheusをCentOS7.3＆Docker上にインストールしてみた

AlertManagerのインストール

１．AlertMananagerのURLコピー

Prometheus公式サイトからAlertManagerをダウンロードします。

今回の環境では、次のものを選択します。
Operating system: linux
Architecture: amd64

alertmanagerを探して、リンクのアドレスをコピーしてください。

２．ダウンロード

<Promethusサーバ>
## cd /usr/local/src
## wget https://github.com/prometheus/alertmanager/releases/download/v0.5.1/alertmanager-0.5.1.linux-amd64.tar.gz
## tar xfvz alertmanager-0.5.1.linux-amd64.tar.gz
## cd alertmanager-0.5.1.linux-amd64/
## cp -p alertmanager /usr/bin/.

３．設定ファイルの配置

<Promethusサーバ>
## cd /etc/prometheus
## wget https://raw.githubusercontent.com/alerta/prometheus-config/master/alertmanager.yml
(Default状態)
## cat /etc/prometheus/alertmanager.yml
global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'localhost:25'                  
  smtp_from: '[email protected]'           

route:
  receiver: "alerta"
  group_by: ['alertname']
  group_wait:      30s
  group_interval:  5m
  repeat_interval: 2h

receivers:
- name: "alerta"
  webhook_configs:
  - url: 'http://localhost:8080/webhooks/prometheus'
    send_resolved: true

４．自動起動設定
AlertManagerもしっかりと自動起動するようにしましょう。

<Promethusサーバ>
## vi /etc/default/alertmanager
OPTIONS="-config.file /etc/prometheus/alertmanager.yml"

## vi /usr/lib/systemd/system/alertmanager.service

[Unit]
Description=Prometheus alertmanager Service
After=syslog.target.prometheus.alertmanager.service

[Service]
Type=simple
EnvironmentFile=-/etc/default/alertmanager
ExecStart=/usr/bin/alertmanager $OPTIONS
PrivateTmp=true

[Install]
WantedBy=multi-user.target


## systemctl enable alertmanager.service
Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /usr/lib/systemd/system/alertmanager.service.
## systemctl start alertmanager

５．アラート設定前準備（メール設定）

メール送信の仕組みは、環境に合わせて実施しましょう。
今回は、AzureのVM上で環境を組み立てていることもあり、こちらを参考にメール送信の機能を具備します。
Azureのメール送信はSendGrid

６．アラート設定

「３．設定ファイルの配置」のconfigファイルを編集しましょう。
今回は、Mailアラートの設定を入れます。値はDefault値から変えています。

<Promethusサーバ>
## cat alertmanager.yml
global:
# The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'smtp.sendgrid.net:25'    ★ SendGrid のSMTP接続先
  smtp_from: '****************@******'      ★ SendGrid 登録メールアドレス
  smtp_auth_username: '****@azure.com'      ★ SendGrid で払い出されたUserName
  smtp_auth_password: '*******'             ★ SendGrid で設定したパスワード（平文で記載するのはちょっとね）
  smtp_auth_secret: '*********'             ★ SendGrid で払い出されたAPIキー

route:
  receiver: "mail"
  group_by: ['alertname', 'instance', 'severity']   ★ 同一アラート名、同一インスタンス、同一サービスのアラートに対して
  group_wait: 30s                                   ★ 30秒以内のアラートは同一アラートと見なす
  group_interval: 10m                               ★ 10分毎に通知
  repeat_interval: 1h                               ★ 一度通知したアラートは 1時間後に通知

#  receiver: "slack-notifications"
#  group_by: ['alertname', 'instance']

receivers:
 - name: 'mail'
   email_configs:
   - to: *****@********,####@######        ★ アラート送信先のアドレス（複数あるときは、, カンマ区切り）
                                           ★ ㏄は、頑張ったけどできない。。。
                                           ★ toを分けたいときは、-to: を同じように記載すればOK

inhibit_rules:
 - source_match:
     severity: 'critical'                  ★ アラートの深刻度(severity) が critical の場合、
   target_match:                           ★ 同一のアラート名で warning のものは通知しない。
     severity: 'warning'
   equal: ['alertname']

７．ルール設定

ルール設定は、自分で必要となるルールを検討してみてください。

<Promethusサーバ>
## cat /etc/prometheus/alert.rules
ALERT instance_down
  IF up == 0
  FOR 2m
  LABELS { severity = "critical" }
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} down",
    description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 2 minutes.",
  }

ALERT cpu_threshold_exceeded
  IF (100 * (1 - avg by(instance)(irate(node_cpu{job='node',mode='idle'}[5m])))) > THRESHOLD_CPU
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} CPU usage is dangerously high",
    description = "This device's cpu usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT mem_threshold_exceeded
  IF (node_memory_MemFree{job='node'} + node_memory_Cached{job='node'} + node_memory_Buffers{job='node'})/1000000 < THRESHOLD_MEM
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} memory usage is dangerously high",
    description = "This device's memory usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT filesystem_threshold_exceeded
  IF node_filesystem_avail{job='node',mountpoint='/'} / node_filesystem_size{job='node'} * 100 < THRESHOLD_FS
  ANNOTATIONS {
    summary = "Instance {{ $labels.instance }} filesystem usage is dangerously high",
    description = "This device's filesystem usage has exceeded the threshold with a value of {{ $value }}.",
  }

ALERT node_high_loadaverage
  IF rate(node_load1[1m]) > 2
  FOR 10s
  LABELS { severity = "warning" }
  ANNOTATIONS {
    summary = "High load average on {{$labels.instance}}",
    description = "{{$labels.instance}} has a high load average above 10s (current value: {{$value}})"
  }

８．prometheus に組み込み

PrometheusにAlertmanagerを組み込みましょう。
/etc/prometheus/prometheus.yml の末尾に追加します。

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets: ['<ホスト名>.japaneast.cloudapp.azure.com:9093']

９．最後に

設定ファイルの記載が正しいかちゃんと確認しましょう。

<Promethusサーバ>
## promtool check-config /etc/prometheus/prometheus.yml
## promtool check-config /etc/prometheus/alertmanager.yml

alertmanager、Prometheusを再起動して完了です。

<Promethusサーバ>
## systemctl restart alertmanager
## systemctl restart prometheus

１０．動作確認

適当に監視対象のサーバを止めてみましょう。
メールが飛んできます。

参考サイト

Tech-Sketch
Prometheus環境構築手順
 Azureのメール送信はSendGrid

Author And Source

この問題について(Prometheusアラート設定), 我々は、より多くの情報をここで見つけました https://qiita.com/miwato/items/b26d82fdf5324936ed8b

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .