S3のデータをTreasuredataに定期的に転送 (data-connector)


ディレクトリとスケジュールを登録しとけば定期的にインポートしてくれるらしいので公式を見ながら設定した

seed.yml を用意

in:
  type: s3
  access_key_id: XXXXXXXXXX
  secret_access_key: YYYYYYYYYY
  bucket: sample_bucket
  # path to the *.json or *.csv or *.tsv file on your s3 bucket
  path_prefix: path/to/sample_file
out:
  mode: append    

load.yml を用意

テンプレートを作成

td connector:guess seed.yml -o load.yml

load.ymlが出力される

in:
  type: s3
  access_key_id: XXXXXXXXXX
  secret_access_key: YYYYYYYYYY
  bucket: sample_bucket
  path_prefix: path/to/sample_file
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: ''
    skip_header_lines: 1
    columns:
    - name: id
      type: long
    - name: company
      type: string
    - name: customer
      type: string
    - name: created_at
      type: timestamp
      format: '%Y-%m-%d %H:%M:%S'
out:
  mode: append

プレビュー

td connector:preview load.yml

スケジュールジョブを登録

td connector:create schedule_import "*/10 * * * *" \
  tsuru_analyzer data_connector_test load.yml \
  --time-column time

スケジュール確認

td connector:list

+-----------------+--------------+----------+-------+----------------+---------------------+
| Name            | Cron         | Timezone | Delay | Database       | Table               |
+-----------------+--------------+----------+-------+----------------+---------------------+
| schedule_import | */10 * * * * | UTC      | 0     | tsuru_analyzer | data_connector_test |
+-----------------+--------------+----------+-------+----------------+---------------------+

実行ログ確認

td connector:history schedule_import

+-----------+---------+---------+----------------+---------------------+----------+-----------------------+----------+
| JobID     | Status  | Records | Database       | Table               | Priority | Started               | Duration |
+-----------+---------+---------+----------------+---------------------+----------+-----------------------+----------+
| 108792166 | success | 0       | tsuru_analyzer | data_connector_test | 0        | 2016-12-14 11:49:1... | 9        |
| 108791473 | success | 23      | tsuru_analyzer | data_connector_test | 0        | 2016-12-14 11:42:3... | 9        |
| 108789810 | success | 2225228 | tsuru_analyzer | data_connector_test | 0        | 2016-12-14 11:29:2... | 792      |
+-----------+---------+---------+----------------+---------------------+----------+-----------------------+----------+

最後にテーブルを確認し、実際にインポートされていることが確認できたので終了

追記:
Jsonのマッピングを行う場合はこちらを参照