モニタリング指標およびprometheusルール-継続的な改善中
(1)node exporter標準性能指標
1)モニタ項目cpu使用率:(100-(avg by(instance)(irate(node_cpu_seconds_total{mode="idle"}[5 m])100))メモリ使用率:(100-((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes)100))ディスク使用率:(1-(node_filesystem_free_bytes{fstype=~「ext 3|ext 4|xfs」}/node_filesystem_size_bytes{fstype=~「ext 3|ext 4|xfs」})*100
2)prometheusルール
(2)mysqlモニタリング性能指標
1)mysql性能指標
2)prometheusルール
(3)pod性能指標1)容器性能指標
2)prometheusルール
参照ドキュメント:
http://ylzheng.com/2018/04/02/use-prometheus-monitor-mysql/https://www.cnblogs.com/zengkefu/p/5658252.htmlhttps://blog.csdn.net/qq_25934401/article/details/82594478 https://blog.csdn.net/qq_39570637/article/details/81711328https://blog.csdn.net/ichglauben/article/details/82381438
1)モニタ項目cpu使用率:(100-(avg by(instance)(irate(node_cpu_seconds_total{mode="idle"}[5 m])100))メモリ使用率:(100-((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes)100))ディスク使用率:(1-(node_filesystem_free_bytes{fstype=~「ext 3|ext 4|xfs」}/node_filesystem_size_bytes{fstype=~「ext 3|ext 4|xfs」})*100
2)prometheusルール
groups:
- name: alert-rule
rules:
- alert: NodeFilesystemUsage-high
expr: (1- (node_filesystem_free_bytes{fstype=~"ext3|ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs"}) ) * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: High Node Filesystem usage detected"
description: "{{$labels.instance}}: Node Filesystem usage is above 80% ,(current value is: {{ $value }})"
- alert: NodeMemoryUsage
expr: (100 - (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes) * 100)) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: High Node Memory usage detected"
description: "{{$labels.instance}}: Node Memory usage is above 80% ,(current value is: {{ $value }})"
- alert: NodeCPUUsage
expr: (100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Node High CPU usage detected"
description: "{{$labels.instance}}: Node CPU usage is above 80% ,(current value is: {{ $value }})"
(2)mysqlモニタリング性能指標
1)mysql性能指標
mysql is down :mysql_up
:rate(mysql_global_status_slow_queries[5m])
:rate(mysql_global_status_threads_connected[5m]) > 200
mysql_global_variables_max_connections - mysql_global_status_threads_connected <200
:rate(mysql_global_status_slow_queries[5m])
mysql sql : mysql_slave_status_slave_sql_running
mysql :rate(mysql_slave_status_seconds_behind_master[5m])
2)prometheusルール
groups:
- name: MySQLStatsAlert
rules:
- alert: MySQL is down
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} MySQL is down"
description: "MySQL database is down. This requires immediate action!"
- alert: Mysql_High_QPS
expr: rate(mysql_global_status_questions[5m]) > 500
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_High_QPS detected"
description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_Connections
expr: rate(mysql_global_status_threads_connected[5m]) > 200
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql Too Many Connections detected"
description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_slow_queries
expr: rate(mysql_global_status_slow_queries[5m]) > 3
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_Too_Many_slow_queries detected"
description: "{{$labels.instance}}: Mysql slow_queries is more than 3 per second ,(current value is: {{ $value }})"
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} SQL thread stopped"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: Slave lagging behind Master
expr: rate(mysql_slave_status_seconds_behind_master[5m]) >30
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"
(3)pod性能指標1)容器性能指標
pod cpu :container_memory_usage_bytes{container_name!=""} / container_spec_memory_limit_bytes{container_name!=""} *100 != +Inf
pod : sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!=""}[1m] ) ) * 100
2)prometheusルール
groups:
- name: noah_pod.rules
rules:
- alert: PodMemUsage
expr: container_memory_usage_bytes{container_name!=""} / container_spec_memory_limit_bytes{container_name!=""} *100 != +Inf > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.name}}: Pod High Mem usage detected"
description: "{{$labels.name}}: Pod Mem is above 80% ,(current value is: {{ $value }})"
- alert: PodCpuUsage
expr: sum by (pod_name)( rate(container_cpu_usage_seconds_total{image!=""}[1m] ) ) * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.name}}: Pod High CPU usage detected"
description: "{{$labels.name}}: Pod CPU is above 80% ,(current value is: {{ $value }})"
参照ドキュメント:
http://ylzheng.com/2018/04/02/use-prometheus-monitor-mysql/https://www.cnblogs.com/zengkefu/p/5658252.htmlhttps://blog.csdn.net/qq_25934401/article/details/82594478 https://blog.csdn.net/qq_39570637/article/details/81711328https://blog.csdn.net/ichglauben/article/details/82381438