AWS S3 サーバーアクセスログをLambdaで見やすくする


サーバーアクセスログ

以下のような書式で出力されるが、見にくい。JSON みたいなキーと値の書式で見たかったので、 PUT をトリガーに Lambda で処理してCloudWatch Logs に見やすくしたログを出力した。

79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 3E57427F3EXAMPLE REST.GET.VERSIONING - "GET /awsexamplebucket1?versioning HTTP/1.1" 200 - 113 - 7 - "-" "S3Console/0.4" - s9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 891CE47D2EXAMPLE REST.GET.LOGGING_STATUS - "GET /awsexamplebucket1?logging HTTP/1.1" 200 - 242 - 11 - "-" "S3Console/0.4" - 9vKBE6vMhrNiWHZmb2L0mXOcqPGzQOI5XLnCtZNPxev+Hf+7tpT6sxDwDty4LHBUOZJG96N1234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:00:38 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be A1206F460EXAMPLE REST.GET.BUCKETPOLICY - "GET /awsexamplebucket1?policy HTTP/1.1" 404 NoSuchBucketPolicy 297 - 38 - "-" "S3Console/0.4" - BNaBsXZQQDbssi6xMBdBU2sLt+Yf5kZDmeBUP35sFoKa3sLLeMC78iwEIWxs99CRUrbS4n11234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:01:00 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be 7B4A0FABBEXAMPLE REST.GET.VERSIONING - "GET /awsexamplebucket1?versioning HTTP/1.1" 200 - 113 - 33 - "-" "S3Console/0.4" - Ke1bUcazaN1jWuUlPJaxF64cQVpUEhoZKEG/hmy/gijN/I1DeWqDfFvnpybfEseEME/u7ME1234= SigV2 ECDHE-RSA-AES128-GCM-SHA256 AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1
79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be awsexamplebucket1 [06/Feb/2019:00:01:57 +0000] 192.0.2.3 79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be DD6CC733AEXAMPLE REST.PUT.OBJECT s3-dg.pdf "PUT /awsexamplebucket1/s3-dg.pdf HTTP/1.1" 200 - - 4406583 41754 28 "-" "S3Console/0.4" - 10S62Zv81kBW7BB6SX4XJ48o6kpcl6LPwEoizZQQxJd5qDSCTLX0TgS37kYUBKQW3+bPdrg1234= SigV4 ECDHE-RSA-AES128-SHA AuthHeader awsexamplebucket1.s3.us-west-1.amazonaws.com TLSV1.1

こういう書式で見たい。

{
  'Bucket Owner': 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
  'Bucket': 'bucket',
  'Time': '[29/May/2021:05:17:36 +0000]',
  'Remote IP': 'xxx.xxx.xxx.xxx',
  'Requester': 'arn:aws:iam::123456789012:user/test-user',
  'Request ID': 'XXXXXXXXXXXXXX',
  'Operation': 'REST.GET.VERSIONING',
  'Key': '-',
  'Request-URI': '"GET /bucket?versioning= HTTP/1.1"',
  'HTTP status': '200',
  'Error Code': '-',
  'Bytes Sent': '113',
  'Object Size': '-',
  'Total Time': '27',
  'Turn-Around Time': '-',
  'Referer': '"S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.1002 Linux/5.4.116-64.217.amzn2int.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation cfg/retry-mode/legacy"',
  'Version Id': '-',
  'Host Id': 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  'Signature Version': 'SigV4',
  'Cipher Suite': 'ECDHE-RSA-AES128-GCM-SHA256',
  'Authentication Type': 'AuthHeader',
  'Host Header': 's3-ap-northeast-1.amazonaws.com',
  'TLS version': 'TLSv1.2'
}

Lambda 関数

いい感じにする Lambda 関数

import json
import boto3

keys = [
    'Bucket Owner',
    'Bucket',
    'Time',
    'Remote IP',
    'Requester',
    'Request ID',
    'Operation',
    'Key',
    'Request-URI',
    'HTTP status',
    'Error Code',
    'Bytes Sent',
    'Object Size',
    'Total Time',
    'Turn-Around Time',
    'Referer',
    'Version Id',
    'Host Id',
    'Signature Version',
    'Cipher Suite',
    'Authentication Type',
    'Host Header',
    'TLS version'
]

def lambda_handler(event, context):
    for n, record in enumerate(event['Records']):
        bucket_name = record['s3']['bucket']['name']
        object_name = record['s3']['object']['key']
        object_path = '/tmp/' + object_name
        resource = boto3.resource('s3')
        resource.Object(
            bucket_name, object_name
        ).download_file(
            object_path
        )
        fr = open(object_path)
        for line in fr.readlines():
            items = line.split(' ')
            new_items = []
            tmp_item = ''
            join_flag = False
            for item in items:
                if item[0] in ['"', '[']:
                    join_flag = True
                    tmp_item = item
                elif item[-1] in ['"', ']']:
                    join_flag = False
                    tmp_item += ' ' + item
                    new_items.append(tmp_item)
                    tmp_item = ''
                elif join_flag:
                    tmp_item += ' ' + item
                else:
                    new_items.append(item)
            log_data = {}
            for key, item in zip(keys, new_items):
                log_data[key] = item.strip('\n')
            print('Log Data: ', log_data)

CloudWatch Logs への出力

Log Data:  {'Bucket Owner': 'xxxxxxxxxxxxxxxxxx', 'Bucket': 'bucket', 'Time': '[29/May/2021:05:17:36 +0000]', 'Remote IP': 'xxx.xxx.xxx.xxx', 'Requester': 'arn:aws:iam::123456789012:user/test-user', 'Request ID': 'XXX', 'Operation': 'REST.GET.VERSIONING', 'Key': '-', 'Request-URI': '"GET /bucket?versioning= HTTP/1.1"', 'HTTP status': '200', 'Error Code': '-', 'Bytes Sent': '113', 'Object Size': '-', 'Total Time': '27', 'Turn-Around Time': '-', 'Referer': '"S3Console/0.4, aws-internal/3 aws-sdk-java/1.11.1002 Linux/5.4.116-64.217.amzn2int.x86_64 OpenJDK_64-Bit_Server_VM/25.282-b08 java/1.8.0_282 vendor/Oracle_Corporation cfg/retry-mode/legacy"', 'Version Id': '-', 'Host Id': 'yyy', 'Signature Version': 'SigV4', 'Cipher Suite': 'ECDHE-RSA-AES128-GCM-SHA256', 'Authentication Type': 'AuthHeader', 'Host Header': 's3-ap-northeast-1.amazonaws.com', 'TLS version': 'TLSv1.2'}

CloudWatch Logs で検索しやすい。

参考記事