Elasticsearch + Kibana + Embulk on VirtualBox


VirtualBoxに2台の仮想マシンを用意して、1台目はDockerを利用したElasticsearchとKibana環境、2台目は仮想マシンにEmbulkをインストールしてログを取り込んでみた時の記録です。

項目 対象
Hypervisor VirtualBox
OS CentOS Linux release 7.4.1708 (Core)
仮想マシン01 Embulk 192.168.56.28
仮想マシン02 Elasticsearch Kibana 192.168.56.29

※OSについては、当方の環境と全く同じでない場合もあるので足りない場合は、yum等でインストールしてください。

1.仮想マシン01 Elasticsearch Kibana

Elasticsearch Kibana

ElasticsearchとKibanahaは、コンテナー利用とするためDockerとDocker-Composeをインストールします。

コマンド
# yum install -y docker
# systemctl enable docker
# systemctl start docker
# curl -L https://github.com/docker/compose/releases/download/1.9.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
# chmod +x /usr/local/bin/docker-compose

ElasticsearchとKibanaのdocker-compose.ymlを作成します。

コマンド
# vim docker-compose.yml
----------------------------------------------------
elasticsearch:
  image: elasticsearch
  container_name: elasticsearch
  ports:
    - "9200:9200"
  environment:
      ES_JAVA_OPTS: '-Xms2048m -Xmx2048m'
kibana:
  image: kibana
  container_name: kibana
  links:
    - elasticsearch:elasticsearch
  ports:
    - "5601:5601"
----------------------------------------------------
:wq

ElasticsearchとKibanaのコンテナーを起動します。

コマンド
# docker-compose up -d
Pulling elasticsearch (elasticsearch:latest)...
Trying to pull repository docker.io/library/elasticsearch ...
latest: Pulling from docker.io/library/elasticsearch
723254a2c089: Pull complete
abe15a44e12f: Pull complete
409a28e3cc3d: Pull complete
a9511c68044a: Pull complete
9d1b16e30bc8: Pull complete
0fc5a09c9242: Pull complete
d34976006493: Pull complete
3b70003f0c10: Pull complete
c85e66a46c7c: Pull complete
c1d6383769d6: Pull complete
da8d73630b44: Pull complete
5f0e52287884: Pull complete
770995441948: Pull complete
a5b2e358a5e0: Pull complete
7ab1d4a5e3eb: Pull complete
Digest: sha256:04f7cfc825b2951f928be7eb74defa5ac8687c990ba70319dae1d6119488ae9e
Pulling kibana (kibana:latest)...
Trying to pull repository docker.io/library/kibana ...
latest: Pulling from docker.io/library/kibana
f49cf87b52c1: Pull complete
9e8acb2289dd: Pull complete
d495c79e5bf4: Pull complete
81c8b3679622: Pull complete
2a4eff393768: Pull complete
5fa4e981b17d: Pull complete
e23852241c5b: Pull complete
411a85463ec1: Pull complete
8206f115bd3e: Pull complete
Digest: sha256:fe3ffbd866108f9c98a76fdf51db2c6c9cc937fb8ba153d4474acff72265d86a
Creating elasticsearch
Creating kibana
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                              NAMES
3fd18e23c9da        kibana              "/docker-entrypoint.s"   About a minute ago   Up About a minute   0.0.0.0:5601->5601/tcp             kibana
88665a0b7aa5        elasticsearch       "/docker-entrypoint.s"   About a minute ago   Up About a minute   0.0.0.0:9200->9200/tcp, 9300/tcp   elasticsearch

2.仮想マシン02 Embuluk

Embulk

Embulkをインストールします。

コマンド
# yum install -y java
# curl --create-dirs -o ~/.embulk/bin/embulk -L "http://dl.embulk.org/embulk-latest.jar"
# chmod +x ~/.embulk/bin/embulk
# echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
# source ~/.bashrc
# embulk
Embulk v0.8.39
Usage: embulk [-vm-options] <command> [--options]
Commands:
   mkbundle   <directory>                             # create a new plugin bundle environment.
   bundle     [directory]                             # update a plugin bundle environment.
   run        <config.yml>                            # run a bulk load transaction.
   cleanup    <config.yml>                            # cleanup resume state.
   preview    <config.yml>                            # dry-run the bulk load without output and show preview.
   guess      <partial-config.yml> -o <output.yml>    # guess missing parameters to create a complete configuration file.
   gem        <install | list | help>                 # install a plugin or show installed plugins.
   new        <category> <name>                       # generates new plugin template
   migrate    <path>                                  # modify plugin code to use the latest Embulk plugin API
   example    [path]                                  # creates an example config file and csv file to try embulk.
   selfupdate [version]                               # upgrades embulk to the latest released version or to the specified version.

VM options:
   -E...                            Run an external script to configure environment variables in JVM
                                    (Operations not just setting envs are not recommended nor guaranteed.
                                     Expect side effects by running your external script at your own risk.)
   -J-O                             Disable JVM optimizations to speed up startup time (enabled by default if command is 'run')
   -J+O                             Enable JVM optimizations to speed up throughput
   -J...                            Set JVM options (use -J-help to see available options)
   -R...                            Set JRuby options (use -R--help to see available options)

Use `<command> --help` to see description of the commands.

Elasticsearchのプラグインをインストールします。

コマンド
# embulk gem install embulk-output-elasticsearch_ruby
2017-12-19 20:03:16.257 +0900: Embulk v0.8.39

********************************** INFORMATION **********************************
  Join us! Embulk-announce mailing list is up for IMPORTANT annoucement such as
    compatibility-breaking changes and key feature updates.
  https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************


Gem plugin path is: /root/.embulk/jruby/2.3.0

Fetching: multi_json-1.12.2.gem (100%)
Successfully installed multi_json-1.12.2
Fetching: multipart-post-2.0.0.gem (100%)
Successfully installed multipart-post-2.0.0
Fetching: faraday-0.13.1.gem (100%)
Successfully installed faraday-0.13.1
Fetching: elasticsearch-transport-6.0.0.gem (100%)
Successfully installed elasticsearch-transport-6.0.0
Fetching: elasticsearch-api-6.0.0.gem (100%)
Successfully installed elasticsearch-api-6.0.0
Fetching: elasticsearch-6.0.0.gem (100%)
Successfully installed elasticsearch-6.0.0
Fetching: excon-0.60.0.gem (100%)
Successfully installed excon-0.60.0
Fetching: embulk-output-elasticsearch_ruby-0.1.6.gem (100%)
Successfully installed embulk-output-elasticsearch_ruby-0.1.6
8 gems installed

動的にcolumnsを作成するプラグインも追加します。

コマンド
# embulk gem install embulk-parser-csv_guessable

2017-12-20 22:52:10.568 +0900: Embulk v0.8.34

Gem plugin path is: /root/.embulk/jruby/2.3.0

Fetching: embulk-parser-csv_guessable-0.1.5.gem (100%)
Successfully installed embulk-parser-csv_guessable-0.1.5
1 gem installed

map.jsonファイルを作成します。読み込むログファイルのカラム要素などが不明な場合は、typeをstring、indexをnot_analyzedでカラム数分指定して、とりあえず読み込めるようにします。今回は31カラムあるログです。(これは力業的な感じです。。)

コマンド
# vim import_log.json
-------------------------------------------------------
{
    "mappings": {
        "ログファイル名": {
            "properties": {
                "1": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "2": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "3": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "4": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "5": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "6": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "7": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "8": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "9": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "10": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "11": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "12": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "13": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "14": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "15": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "16": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "17": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "18": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "19": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "20": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "21": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "22": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "23": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "24": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "25": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "26": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "27": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "28": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "29": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "30": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "31": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}
-------------------------------------------------------
:wq

読み込むログを格納するディレクトリを作成して、WinSCP等でログを格納します。
import.ymlを作成します。

コマンド
# mkdir /var/log/import
# vim import.yml
-------------------------------------------------------
in:
  type: file
  path_prefix: /var/log/import/import.log
  parser:
    type: csv_guessable
    schema_file: /var/log/import/import.log
    columns:
      - {name: 1, type: string}
      - {name: 2, type: string}
      - {name: 3, type: string}
      - {name: 4, type: string}
      - {name: 5, type: string}
      - {name: 6, type: string}
      - {name: 7, type: string}
      - {name: 8, type: string}
      - {name: 9, type: string}
      - {name: 10, type: string}
      - {name: 11, type: string}
      - {name: 12, type: string}
      - {name: 13, type: string}
      - {name: 14, type: string}
      - {name: 15, type: string}
      - {name: 16, type: string}
      - {name: 17, type: string}
      - {name: 18, type: string}
      - {name: 19, type: string}
      - {name: 20, type: string}
      - {name: 21, type: string}
      - {name: 22, type: string}
      - {name: 23, type: string}
      - {name: 24, type: string}
      - {name: 25, type: string}
      - {name: 26, type: string}
      - {name: 27, type: string}
      - {name: 28, type: string}
      - {name: 29, type: string}
      - {name: 30, type: string}
      - {name: 31, type: string}
      - {name: 32, type: string}
exec: {}
out:
    type: elasticsearch_ruby
    nodes:
    - {host: 192.168.56.29, port: 9200}
    index: import
    index_type: import
-------------------------------------------------------
:wq

マッピングをcurlで設定します。 access_log はマッピングを登録するインデックス名です。

コマンド
# curl -XPUT '192.168.56.29:9200/embulk_access_log' -d @import_log.json
{"acknowledged":true,"shards_acknowledged":true,"index":"embulk_access_log"}

Embulkを実行します。

コマンド
# embulk run import.yml
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
2017-12-21 17:23:18.984 +0900: Embulk v0.8.39

********************************** INFORMATION **********************************
  Join us! Embulk-announce mailing list is up for IMPORTANT annoucement such as
    compatibility-breaking changes and key feature updates.
  https://groups.google.com/forum/#!forum/embulk-announce
*********************************************************************************

2017-12-21 17:23:30.984 +0900 [INFO] (0001:transaction): Loaded plugin embulk-output-elasticsearch_ruby (0.1.6)
2017-12-21 17:23:31.130 +0900 [INFO] (0001:transaction): Loaded plugin embulk-parser-csv_guessable (0.1.5)
2017-12-21 17:23:31.202 +0900 [INFO] (0001:transaction): Listing local files at directory '/var/log/import' filtering filename by prefix 'import.log'
2017-12-21 17:23:31.203 +0900 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2017-12-21 17:23:31.206 +0900 [INFO] (0001:transaction): Loading files [/var/log/import/import.log]
2017-12-21 17:23:31.375 +0900 [INFO] (0001:transaction): Using local thread executor with max_threads=2 / tasks=1
2017-12-21 17:23:31.405 +0900 [INFO] (0001:transaction): mode => normal
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): nodes => [{"host"=>"192.168.56.29", "port"=>9200}]
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index => import
2017-12-21 17:23:31.436 +0900 [INFO] (0001:transaction): index_type => import
2017-12-21 17:23:31.437 +0900 [INFO] (0001:transaction): alias =>
2017-12-21 17:23:31.619 +0900 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2017-12-21 17:23:34.129 +0900 [INFO] (0014:task-0000): bulk: 287 success.
2017-12-21 17:23:34.130 +0900 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
2017-12-21 17:23:34.139 +0900 [INFO] (main): Committed.
2017-12-21 17:23:34.139 +0900 [INFO] (main): Next config diff: {"in":{"last_path":"/var/log/import/import.log"},"out":{}}

ブラウザを起動してKibanaで確認します。

「Index pattern」にembulk_access_logを指定し、「Create」ボタンをクリックします。

Kibanaについては、取り込んだログを可視化して分析などできるのですが、これはこれで勉強が必要ですね。

3.参考図書