zabbixを使用してHAProxyのステータス情報を監視
HAProxy+Keepalived方式でゲームサーバのフロントエンドの負荷バランスと高可用性を導入するため、HAProxyの監視状況をリアルタイムで監視する必要がある.
本明細書で使用するHAProxyバージョンは1.4である.24
公式文書を参照http://cbonte.github.io/haproxy-dconv/configuration-1.4.htmlタブで行います.
9. Statistics and monitoring
https://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy
https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy
1.監視原理説明
HAProxyはHTTPページと状態Unix Socketを提供し、HAProxyの状態情報を表示し、CSV形式でエクスポートすることができる.
HTTPページは類似http://10.10.41.100/status;csvの表示方法
Unix Socketは
echo "show info;show stat"| sudo socat stdio unix-connect:/tmp/haproxy
本文は主に第2の方法でHAProxyの状態情報を取得する
ではcfgプロファイルでステータスsocketを設定
stats socket /tmp/haproxy level admin
レベルの後にレベルuser,operator,adminを付けることができます
userは最低限の権限レベルであり、いくつかの非機密情報しか表示されません.
Operatorではすべての情報が表示されますが、いくつかの非機密情報しか変更できません.
adminはすべての情報を表示して操作することができます.慎重に使用する必要があります.
$echo "show help"| sudo socat stdio unix-connect:/tmp/haproxy
Unknown command. Please enter one of the following commands only :
clear counters : clear max statistics counters (add 'all' for all counters)
help : this message
prompt : toggle interactive mode with prompt
quit : disconnect
show info : report information about the running process
show stat : report counters for each proxy and server
show errors : report last request and response errors for each proxy
show sess [id] : report the list of current sessions or dump this session
get weight : report a server's current weight
set weight : change a server's weight
set timeout : change a timeout setting
disable server : set a server in maintenance mode
enable server : re-enable a server that was previously in maintenance mode
show infoは現在のHAProxyプロセス情報を報告する
Name: HAProxy
Version: 1.4.24
Release_date: 2013/06/17
Nbproc: 1
Process_num: 1
Pid: 7020
Uptime: 110d 16h25m55s
Uptime_sec: 9563155
Memmax_MB: 0
Ulimit-n: 131101
Maxsock: 131101
Maxconn: 65536
Maxpipes: 0
CurrConns: 14
PipesUsed: 0
PipesFree: 0
Tasks: 26
Run_queue: 1
node: master_loadbalance1
description: lb1
show statはHAProxyの各指標のカウントを表示する
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf
ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod
e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,
srv_abrt,
login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628
,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,
login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963
,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,
login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963
HAProxyがマルチプロセスで起動してnbprocの値が1でない場合、各プロセスはsocketでステータス情報を表示できるので、表示されるステータス情報は複数のプロセス間で切り替えられることに注意してください.
2.監視スクリプト作成
ここには3つの監視スクリプトがあります
haproxy_info.sh HAProxyの基本情報収集に用いる
haproxy_pool_discovery.py zabbix用LLD機能によりlogin_などの各poolペアを発見pool:BACKEND,login_pool:web1_80等、低レベル発見により、プロファイルに配置されたバックエンドホストに基づいて各バックエンドホストの状態を動的に監視できる
haproxy_stat.shはstat socketにshow statコマンドを送信することによって各状態の値を収集し、スクリプトでは、FRONTENDまたはBACKENDのみが存在するフィールドや、FRONTENDおよびBACKEND以外のフィールドがあるため、2番目のフィールドの値を判断します.
haproxy_info.sh
haproxy_pool_discovery.py
socatをインストールし、zabbxiクライアントユーザーがsudo権限を持ってsocatを実行するように設定する必要があります.
visudoコマンド変更の実行
次のように
実行結果
haproxy_stat.sh
3.zabbixプロファイルの変更
haproxy_を追加status.conf
4.zabbixテンプレートの追加
詳細テンプレートリファレンス添付ファイル
本明細書で使用するHAProxyバージョンは1.4である.24
公式文書を参照http://cbonte.github.io/haproxy-dconv/configuration-1.4.htmlタブで行います.
9. Statistics and monitoring
https://github.com/olindata/tribily-zabbix-templates/tree/master/App_HAProxy
https://github.com/jlyheden/zabbix_scripts/tree/master/haproxy
1.監視原理説明
HAProxyはHTTPページと状態Unix Socketを提供し、HAProxyの状態情報を表示し、CSV形式でエクスポートすることができる.
HTTPページは類似http://10.10.41.100/status;csvの表示方法
Unix Socketは
echo "show info;show stat"| sudo socat stdio unix-connect:/tmp/haproxy
本文は主に第2の方法でHAProxyの状態情報を取得する
ではcfgプロファイルでステータスsocketを設定
stats socket /tmp/haproxy level admin
レベルの後にレベルuser,operator,adminを付けることができます
userは最低限の権限レベルであり、いくつかの非機密情報しか表示されません.
Operatorではすべての情報が表示されますが、いくつかの非機密情報しか変更できません.
adminはすべての情報を表示して操作することができます.慎重に使用する必要があります.
$echo "show help"| sudo socat stdio unix-connect:/tmp/haproxy
Unknown command. Please enter one of the following commands only :
clear counters : clear max statistics counters (add 'all' for all counters)
help : this message
prompt : toggle interactive mode with prompt
quit : disconnect
show info : report information about the running process
show stat : report counters for each proxy and server
show errors : report last request and response errors for each proxy
show sess [id] : report the list of current sessions or dump this session
get weight : report a server's current weight
set weight : change a server's weight
set timeout : change a timeout setting
disable server : set a server in maintenance mode
enable server : re-enable a server that was previously in maintenance mode
show infoは現在のHAProxyプロセス情報を報告する
Name: HAProxy
Version: 1.4.24
Release_date: 2013/06/17
Nbproc: 1
Process_num: 1
Pid: 7020
Uptime: 110d 16h25m55s
Uptime_sec: 9563155
Memmax_MB: 0
Ulimit-n: 131101
Maxsock: 131101
Maxconn: 65536
Maxpipes: 0
CurrConns: 14
PipesUsed: 0
PipesFree: 0
Tasks: 26
Run_queue: 1
node: master_loadbalance1
description: lb1
show statはHAProxyの各指標のカウントを表示する
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkf
ail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_cod
e,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,
srv_abrt,
login_game_pool,FRONTEND,,,24,868,2000,196721023,87244966860,121969199234,0,0,171448,,,,,OPEN,,,,,,,,,1,1,0,,,,0,95,0,628
,,,,0,195071390,0,1619236,28338,2034,,93,611,196721000,,,
login_pool,web1_80,0,0,0,38,2000,8333681,2356031055,2827436427,,0,,0,3,2211,11,UP,30,1,0,902,0,9558963
,0,,1,2,1,,8329209,,2,1,,199,L7OK,200,1,20,7967292,0,361648,7,0,0,,,,136,0,
login_pool,web2_80,0,0,0,63,2000,8333998,2358035705,2826639220,,0,,1,6,2281,13,UP,30,1,0,861,0,9558963
0. pxname: proxy name
1. svname: service name (FRONTEND for frontend, BACKEND for backend, any name
for server)
2. qcur: current queued requests
3. qmax: max queued requests
4. scur: current sessions
5. smax: max sessions
6. slim: sessions limit
7. stot: total sessions
8. bin: bytes in
9. bout: bytes out
10. dreq: denied requests
11. dresp: denied responses
12. ereq: request errors
13. econ: connection errors
14. eresp: response errors (among which srv_abrt)
15. wretr: retries (warning)
16. wredis: redispatches (warning)
17. status: status (UP/DOWN/NOLB/MAINT/MAINT(via)...)
18. weight: server weight (server), total weight (backend)
19. act: server is active (server), number of active servers (backend)
20. bck: server is backup (server), number of backup servers (backend)
21. chkfail: number of failed checks
22. chkdown: number of UP->DOWN transitions
23. lastchg: last status change (in seconds)
24. downtime: total downtime (in seconds)
25. qlimit: queue limit
26. pid: process id (0 for first instance, 1 for second, ...)
27. iid: unique proxy id
28. sid: service id (unique inside a proxy)
29. throttle: warm up status
30. lbtot: total number of times a server was selected
31. tracked: id of proxy/server if tracking is enabled
32. type (0=frontend, 1=backend, 2=server, 3=socket)
33. rate: number of sessions per second over last elapsed second
34. rate_lim: limit on new sessions per second
35. rate_max: max number of new sessions per second
36. check_status: status of last health check, one of:
UNK -> unknown
INI -> initializing
SOCKERR -> socket error
L4OK -> check passed on layer 4, no upper layers testing enabled
L4TMOUT -> layer 1-4 timeout
L4CON -> layer 1-4 connection problem, for example
"Connection refused" (tcp rst) or "No route to host" (icmp)
L6OK -> check passed on layer 6
L6TOUT -> layer 6 (SSL) timeout
L6RSP -> layer 6 invalid response - protocol error
L7OK -> check passed on layer 7
L7OKC -> check conditionally passed on layer 7, for example 404 with
disable-on-404
L7TOUT -> layer 7 (HTTP/SMTP) timeout
L7RSP -> layer 7 invalid response - protocol error
L7STS -> layer 7 response error, for example HTTP 5xx
37. check_code: layer5-7 code, if available
38. check_duration: time in ms took to finish last health check
39. hrsp_1xx: http responses with 1xx code
40. hrsp_2xx: http responses with 2xx code
41. hrsp_3xx: http responses with 3xx code
42. hrsp_4xx: http responses with 4xx code
43. hrsp_5xx: http responses with 5xx code
44. hrsp_other: http responses with other codes (protocol error)
45. hanafail: failed health checks details
46. req_rate: HTTP requests per second over last elapsed second
47. req_rate_max: max number of HTTP requests per second observed
48. req_tot: total number of HTTP requests received
49. cli_abrt: number of data transfers aborted by the client
50. srv_abrt: number of data transfers aborted by the server (inc. in eresp)
HAProxyがマルチプロセスで起動してnbprocの値が1でない場合、各プロセスはsocketでステータス情報を表示できるので、表示されるステータス情報は複数のプロセス間で切り替えられることに注意してください.
2.監視スクリプト作成
ここには3つの監視スクリプトがあります
haproxy_info.sh HAProxyの基本情報収集に用いる
haproxy_pool_discovery.py zabbix用LLD機能によりlogin_などの各poolペアを発見pool:BACKEND,login_pool:web1_80等、低レベル発見により、プロファイルに配置されたバックエンドホストに基づいて各バックエンドホストの状態を動的に監視できる
haproxy_stat.shはstat socketにshow statコマンドを送信することによって各状態の値を収集し、スクリプトでは、FRONTENDまたはBACKENDのみが存在するフィールドや、FRONTENDおよびBACKEND以外のフィールドがあるため、2番目のフィールドの値を判断します.
haproxy_info.sh
#!/bin/bash
#This script is used for getting haproxy info such as version ,uptime and number of processes etc
metric=$1
stats_socket=/tmp/haproxy
info_file=/tmp/haproxy_info.csv
echo "show info"|/usr/bin/sudo /usr/bin/socat unix-connect:$stats_socket stdio > $info_file
grep $metric $info_file|awk '{print $2}'
haproxy_pool_discovery.py
socatをインストールし、zabbxiクライアントユーザーがsudo権限を持ってsocatを実行するように設定する必要があります.
visudoコマンド変更の実行
次のように
#
# Disable "ssh hostname sudo <cmd>", because it will show the password in clear.
# You have to run "ssh -t hostname sudo <cmd>".
#
Defaults !requiretty
zabbixagent ALL=(root) NOPASSWD:/usr/bin/socat
#/usr/bin/python
#This script is used to discovery disk on the server
import subprocess
import json
args='''echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy|egrep -v '^#|^$'|awk -F',' '{print $1":"$2}' '''
t=subprocess.Popen(args,shell=True,stdout=subprocess.PIPE).communicate()[0]
pools=[]
for pool in t.split('
'):
if len(pool) != 0:
pools.append({'{#POOL_NAME}':pool})
print json.dumps({'data':pools},indent=4,separators=(',',':'))
実行結果
{
"data":[
{
"{#POOL_NAME}":"login_game_pool:FRONTEND"
},
{
"{#POOL_NAME}":"login_pool:web1_80"
},
{
"{#POOL_NAME}":"login_pool:web2_80"
},
{
"{#POOL_NAME}":"login_pool:BACKEND"
},
]
}
haproxy_stat.sh
#!/bin/bash
# login_game_pool:FRONTEND
pool_name=$(echo $1|awk -F':' '{print $1}')
server_name=$(echo $1|awk -F':' '{print $2}')
metric=$2
stat_socket=/tmp/haproxy
stat_file=/tmp/haproxy_stat.csv
echo "show stat"|sudo socat stdio unix-connect:/tmp/haproxy > $stat_file
case $metric in
qcur)
#current queued requests
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $3}' $stat_file
else
echo 0
fi
;;
qmax)
#max queued requests
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $4}' $stat_file
else
echo 0
fi
;;
scur)
#current sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $5}' $stat_file
;;
smax)
#max sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $6}' $stat_file
;;
slim)
#sessions limit
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $7}' $stat_file
;;
stol)
#total sessions
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $8}' $stat_file
;;
bin)
#bytes in
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $9}' $stat_file
;;
bout)
#bytes out
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $10}' $stat_file
;;
dreq)
#denied requests
#only FRONTEND and BACKEND has this field
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $11}' $stat_file
else
echo 0
fi
;;
dresp)
#denied responses
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $12}' $stat_file
;;
ereq)
#request errors
#only FRONTEND has this field
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $13}' $stat_file
else
echo 0
fi
;;
econ)
#connection errors
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $14}' $stat_file
else
echo 0
fi
;;
eresp)
#response errors
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $15}' $stat_file
else
echo 0
fi
;;
status)
#status
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $18}' $stat_file
;;
chkfail)
#number of failed checks
#FRONTEND and BACKEND has not this field
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
echo 0
else
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $22}' $stat_file
fi
;;
chkdown)
#number of UP->DOWN transitions
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $23}' $stat_file
else
echo 0
fi
;;
lastchg)
#last status change in seconds
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $24}' $stat_file
else
echo 0
fi
;;
downtime)
#total downtime in seconds
#FRONTEND has not this field will return 0
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $25}' $stat_file
else
echo 0
fi
;;
lbtot)
#total number of times a server was selected
#FRONTEND has not this field
if [ "$server_name" != "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $31}' $stat_file
else
echo 0
fi
;;
rate)
#number of sessions per second over last elapsed second
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $34}' $stat_file
;;
rate_limit)
#limit on new sessions per second
#only FRONTEND has this field
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $35}' $stat_file
else
echo 0
fi
;;
rate_max)
#max number of new sessions per second
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $36}' $stat_file
;;
check_status)
#status of last health check
if [ "$server_name" == "FRONTEND" -o "$server_name" == "BACKEND" ];then
echo "NULL"
else
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $37}' $stat_file
fi
;;
hrsp_1xx)
#http response with 1xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $40}' $stat_file
;;
hrsp_2xx)
#http response with 2xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $41}' $stat_file
;;
hrsp_3xx)
#http response with 3xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $42}' $stat_file
;;
hrsp_4xx)
#http response with 4xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $43}' $stat_file
;;
hrsp_5xx)
#http response with 5xx code
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $44}' $stat_file
;;
req_rate)
#HTTP requests per second over last elapsed second
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $47}' $stat_file
else
echo 0
fi
;;
req_rate_max)
#max number of HTTP requests per second observed
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $48}' $stat_file
else
echo 0
fi
;;
req_tot)
#total number of HTTP requests recevied
#only FRONTEND has this field,others will return 0
if [ "$server_name" == "FRONTEND" ];then
awk -F"," '$1=="'$pool_name'"&&$2=="'$server_name'"{print $49}' $stat_file
else
echo 0
fi
;;
*)
echo "please input the correct argument"
;;
esac
3.zabbixプロファイルの変更
haproxy_を追加status.conf
### Option: UserParameter
# User-defined parameter to monitor. There can be several user-defined parameters.
# Format: UserParameter=<key>,<shell command>
# See 'zabbix_agentd' directory for examples.
#
# Mandatory: no
# Default:
# UserParameter=
UserParameter=haproxy.info[*],/usr/local/zabbix/bin/haproxy_info.sh $1
UserParameter=haproxy.discovery,/usr/bin/python /usr/local/zabbix/bin/haproxy_pool_discovery.py
UserParameter=haproxy.stat[*],/usr/local/zabbix/bin/haproxy_stat.sh $1 $2
4.zabbixテンプレートの追加
詳細テンプレートリファレンス添付ファイル