Treasure Dataの「INSERT into 'v' field is not supported」エラー


※PrestoでなくHiveについての記事

テーブル作成時、必ずtimeカラムとvカラムが出来る → ドキュメント

vカラムにinsertしようとすると以下のエラーになる

$ td db:create db1
Database 'db1' is created.
Use 'td table:create db1 <table_name>' to create a table.
$ td table:create db1 table1
Table 'db1.table1' is created.
$ td table:create db1 table2
Table 'db1.table2' is created.
$ td query -w -d db1 "INSERT INTO TABLE table1 SELECT 'val1' AS col1"
~略~
Status      : success
Result      :
0 rows in set
$ td query -w -d db1 "SELECT * FROM table1" # SELECT * でvカラムも返る
~略~
+------+-------------------------------------+------------+
| col1 | v                                   | time       |
+------+-------------------------------------+------------+
| val1 | {"col1":"val1","time":"9999999999"} | 9999999999 |
+------+-------------------------------------+------------+
1 row in set
$ td query -w -d db1 "INSERT INTO TABLE table2 SELECT * FROM table1"
~略~
  17/06/02 01:23:44 ERROR ql.Driver: FAILED: SemanticException java.lang.RuntimeException: INSERT into 'v' field is not supported
  org.apache.hadoop.hive.ql.parse.SemanticException: java.lang.RuntimeException: INSERT into 'v' field is not supported
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:6009)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:5876)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:8296)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8187)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9019)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9285)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:427)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:323)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:980)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1045)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
        at com.treasure_data.hadoop.hive.runner.QueryRunner.processQueryCmd(QueryRunner.java:460)
        at com.treasure_data.hadoop.hive.runner.QueryRunner.processCmd(QueryRunner.java:401)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
        at com.treasure_data.hadoop.hive.runner.QueryRunner.run(QueryRunner.java:326)
        at com.treasure_data.hadoop.hive.runner.QueryRunner$1.run(QueryRunner.java:198)
        at com.treasure_data.hadoop.hive.runner.QueryRunner$1.run(QueryRunner.java:196)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
        at com.treasure_data.hadoop.util.TDUtil.doAs(TDUtil.java:272)
        at com.treasure_data.hadoop.hive.runner.QueryRunner.main(QueryRunner.java:196)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
  Caused by: java.lang.RuntimeException: INSERT into 'v' field is not supported
        at com.treasure_data.hadoop.hive.mapred.TDHiveOutputFormat.modifyTablePropertiesForOutput(TDHiveOutputFormat.java:280)
        at com.treasure_data.hadoop.hive.mapred.TDHiveOutputFormat.modifyTablePropertiesForOutput(TDHiveOutputFormat.java:157)
        at com.treasure_data.hadoop.hive.serde2.MessagePackSerDe.modifyTablePropertiesForOutput(MessagePackSerDe.java:95)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:5989)
        ... 29 more
  FAILED: SemanticException java.lang.RuntimeException: INSERT into 'v' field is not supported
~略~
Status      : error

ドキュメントに以下の記述がある

Caution: If you got “java.lang.RuntimeException: INSERT into ‘v’ field is not supported”, then disable V-column of the table through Web console. Or, avoid using ‘*’ in the query.

また、旧バージョンのWebコンソールで当該テーブルの詳細画面でSettingsタブを見ると「Include "v" in schema?」チェックボックスがある

当該チェックボックスをOFFにすると「vカラムをなくす」ことができるように思えるがそうではない

新バージョンのWebコンソールで当該テーブルの詳細画面に入り、右上の「...」から「Table Settings...」を選ぶと、「Edit Settings」ダイアログが表示され、そこに「Include v column in 'SELECT *' Hive queries?」とのチェックボックスがある
つまり、「vカラムをなくす」わけでなく「SELECT *でvを返さない」という設定が出来る

INSERT元のテーブル(table1)について当該チェックボックスをOFFにするとエラーは解消する

$ td query -w -d db1 "SELECT * FROM table1" # SELECT * でvカラムは返らなくなる
~略~
+------+------------+
| col1 | time       |
+------+------------+
| val1 | 9999999999 |
+------+------------+
1 row in set
$ td query -w -d db1 "SELECT v['col1'] FROM table1" # vカラムはなくなっていない
~略~
+------+
| col1 |
+------+
| val1 |
+------+
1 row in set
$ td query -w -d db1 "INSERT INTO TABLE table2 SELECT * FROM table1"
~略~
Status      : success
Result      :
0 rows in set
$ td query -w -d db1 "SELECT * FROM table2" # INSERT先のvカラムは有効
+------+-------------------------------------+------------+
| col1 | v                                   | time       |
+------+-------------------------------------+------------+
| val1 | {"col1":"val1","time":"9999999999"} | 9999999999 |
+------+-------------------------------------+------------+
1 row in set

当該チェックボックスはWebコンソールからテーブルを作成した際はOFFになっている(新バージョンのWebコンソールでしか試してないけど)
Webコンソール以外(Embulk/API/td table:create/Digdagのtd>:/Digdagのtd_ddl>:等)で作成した場合はONになっていそう(全部は調べてないけど)

ちなみに、INSERT元(table1)は当該チェックボックスをONにしてINSERT先(table2)は当該チェックボックスをOFFにすると以下の動きになる
(これを見ると「SELECT *でvを返さない」というだけでもないような)

$ td query -w -d db1 "INSERT INTO TABLE table2 SELECT * FROM table1"
~略~
  It failed to commit. It's retrying 1/8 time.
  It failed to commit. It's retrying 2/8 time.
  It failed to commit. It's retrying 3/8 time.
  It failed to commit. It's retrying 4/8 time.
  It failed to commit. It's retrying 5/8 time.
  It failed to commit. It's retrying 6/8 time.
  It failed to commit. It's retrying 7/8 time.
  It failed to commit. It's retrying 8/8 time.

  INSERT INTO failed:
  TD API returned HTTP code 422
  {"error":"[\"Schema cannot manually set reserved column name \\\"v\\\"\"]","text":"[\"Schema cannot manually set reserved column name \\\"v\\\"\"]","severity":"error"}
~略~
  Job failed with exit code: 12 so won't retry
Status      : error