Hive分析AWS ELBアクセスログ
4050 ワード
AWS ELBアクセスログの保存
AWS ELB-AWS elastic load balancingは、auto scalingを容易にするためにAWS ELB+AutoScalingを選択できます.
ELB設定ページでは、S 3 bucketへのアクセスログの保存を開始できます.
Hive分析ELBログ
Hiveは、S 3 bucketからログファイルを読み出し、SQLクエリを使用することができる.
注意:AWS EMR+Hueを構成でき、後でテストするのに便利です.もちろんS 3 bucketのファイルをローカルに同期してテストすることもできます.
Hive tableの作成
CREATE EXTERNAL TABLE IF NOT EXISTS elb_raw_access_logs (
request_timestamp string,
elb_name string,
request_ip string,
request_port int,
backend_ip string,
backend_port int,
request_processing_time double,
backend_processing_time double,
client_response_time double,
elb_response_code string,
backend_response_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
url string,
protocol string,
user_agent string,
ssl_cipher string,
ssl_protocol string )
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://onetouch-test-elb/';
ぶんせき
データの表示:
SELECT * FROM elb_raw_access_logs WHERE elb_response_code = '200' LIMIT 10;
日付:2017-04-17 T 10:11:32.623734 Z
各モジュールの正常な要求に対する平均応答時間:
SELECT elb_name, avg(backend_processing_time)
FROM elb_raw_access_logs
WHERE elb_response_code == '200'
GROUP BY elb_name;
Hive partition
CREATE EXTERNAL TABLE IF NOT EXISTS elb_raw_access_logs_part (
request_timestamp string,
elb_name string,
request_ip string,
request_port int,
backend_ip string,
backend_port int,
request_processing_time double,
backend_processing_time double,
client_response_time double,
elb_response_code string,
backend_response_code string,
received_bytes bigint,
sent_bytes bigint,
request_verb string,
url string,
protocol string,
user_agent string,
ssl_cipher string,
ssl_protocol string )
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'serialization.format' = '1','input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:\-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \\\"([^ ]*) ([^ ]*) (- |[^ ]*)\\\" (\"[^\"]*\") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://onetouch-test-elb/';
alert table
ALTER TABLE elb_raw_access_logs_part ADD PARTITION (year='2017',month='05',day='30')
location 's3://onetouch-test-elb/proxy/AWSLogs/677234397898/elasticloadbalancing/us-east-1/2017/05/30/';
show partitions elb_raw_access_logs_part;
ぶんせき
2017-05-30のデータは
elb_response_code
でグループ化される.SELECT elb_response_code, count(url) FROM elb_raw_access_logs_part
WHERE year = '2017' AND month = '05' AND day = '30'
GROUP BY elb_response_code;
モジュールのスループットを日単位で集計するには、次の手順に従います.
select year, month, day, count(*) as total_request_count
from auth_elb_access_logs_part
group by year, month, day;
日単位のエラー率:
select year, month, day, SUM( IF( substr(elb_response_code, 1, 1) != '2', 1 , 0 ) )/ COUNT(*) * 100 as error_rate_pct
from auth_elb_access_logs_part
group by year, month, day;
平均応答時間は、1日単位で計算されます.
select year, month, day, avg(backend_processing_time) as avg_backend_processing_time
from auth_elb_access_logs_part
group by year, month, day;