Treasure Data Toolbeltを使ってみる(Mac)


tdコマンドを使えるようにする

Rubyのバージョンを確認する

ターミナル
$ ruby --version
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin16]

tdコマンドのインストール (ruby gem)

ターミナル
$ gem install td

コマンドの存在確認

ターミナル
$ which td
/Users/hoge/.rbenv/shims/td

バージョン確認

ターミナル
$ td --version
0.15.8

バージョンアップ

ターミナル
$ gem update td

アカウント設定(Google SSO Users)

ターミナル
$ td apikey:set <your_apikey>

アカウント確認

ターミナル
$ less ~/.td/td.conf 
[account]
  apikey = **********************************************

コマンドのヘルプ情報

tdコマンド

ターミナル
$ td
usage: td [options] COMMAND [args]

options:
  -c, --config PATH                path to the configuration file (default: ~/.td/td.conf)
  -k, --apikey KEY                 use this API key instead of reading the config file
  -e, --endpoint API_SERVER        specify the URL for API server to use (default: https://api.treasuredata.com).
                                     The URL must contain a scheme (http:// or https:// prefix) to be valid.
                                     Valid IPv4 addresses are accepted as well in place of the host name.
      --insecure                   Insecure access: disable SSL (enabled by default)
  -v, --verbose                    verbose mode
  -h, --help                       show help
  -r, --retry-post-requests        retry on failed post requests.
                                   Warning: can cause resource duplication, such as duplicated job submissions.
      --version                    show version

Basic commands:

  db             # create/delete/list databases
  table          # create/delete/list/import/export/tail tables
  query          # issue a query
  job            # show/kill/list jobs
  import         # manage bulk import sessions (Java based fast processing)
  bulk_import    # manage bulk import sessions (Old Ruby-based implementation)
  result         # create/delete/list result URLs
  sched          # create/delete/list schedules that run a query periodically
  schema         # create/delete/modify schemas of tables
  connector      # manage connectors
  workflow       # manage workflows

Additional commands:

  status         # show scheds, jobs, tables and results
  apikey         # show/set API key
  server         # show status of the Treasure Data server
  sample         # create a sample log file
  help           # show help messages

td queryコマンド

ターミナル
$ td query --help
usage:
  $ td query [sql]

example:
  $ td query -d example_db -w -r rset1 "select count(*) from table1"
  $ td query -d example_db -w -r rset1 -q query.txt

description:
  Issue a query

options:
  -d, --database DB_NAME           use the database (required)
  -w, --wait[=SECONDS]             wait for finishing the job (for seconds)
  -G, --vertical                   use vertical table to show results
  -o, --output PATH                write result to the file
  -f, --format FORMAT              format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz)
  -r, --result RESULT_URL          write result to the URL (see also result:create subcommand)
                                    It is suggested for this option to be used with the -x / --exclude option to suppress printing
                                    of the query result to stdout or -o / --output to dump the query result into a file.
  -u, --user NAME                  set user name for the result URL
  -p, --password                   ask password for the result URL
  -P, --priority PRIORITY          set priority
  -R, --retry COUNT                automatic retrying count
  -q, --query PATH                 use file instead of inline query
  -T, --type TYPE                  set query type (hive, presto)
      --sampling DENOMINATOR       OBSOLETE - enable random sampling to reduce records 1/DENOMINATOR
  -l, --limit ROWS                 limit the number of result rows shown when not outputting to file
  -c, --column-header              output of the columns' header when the schema is available for the table (only applies to json, tsv and csv formats)
  -x, --exclude                    do not automatically retrieve the job result
  -O, --pool-name NAME             specify resource pool by name
      --domain-key DOMAIN_KEY      optional user-provided unique ID. You can include this ID with your `create` request to ensure idempotence

コマンドからクエリを実行する

hogeデータベースのhugaテーブルのselectを実行する。

$ td query -d qa_hdsp -T presto "SELECT * FROM hoge.huga LIMIT 10"

Job 23437**** is queued.
Use 'td job:show 23437****' to show the status.

クエリ実行(任意のAPI KEYを指定して実行する)

hogeデータベースのhugaテーブルのselectを実行する。

ターミナル
$ td -k ********************** query -w -t hive -d hoge -q hoge.huga.sql
hoge.huga.sql
SELECT time FROM hoge.huga LIMIT 10;

※ 「**********************」のところに任意のAPI KEYを指定する。

参考サイト