Hiveクエリでハマりやすいエラーの処方箋


HIVEクエリを書いていてハマったエラーと、その対処法を記載していきます。

WINDOW関数で集計範囲が異なる時のエラー

ROWS BETWEENかの指定が異なる物が混じってるときに発生するエラーです。
他と記述を合わせることで、エラーは解消しました。

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. 
Underlying error: Expecting right window frame boundary for function lag((TOK_TABLE_OR_COL weight), 12) 

エラーが発生するクエリの例

SELECT
  last_value(weight) over(PARTITION BY pref_name,city_name ORDER BY year_month ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_weight,
  lag(weight, 12) over(PARTITION BY pref_name,city_name ORDER BY year_month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM
  ...

WINDOW関数のLAG/LEAD関数のハマリどころ

1つ前の値を次のように取ろうとすると、Prestoでは動きますがHiveではエラーとなります。

SELECT
  --1つ前の値を取得するLAG関数
  LAG(weight) over(partition by pref_name, city_name order by year_month ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM
 ...

解決法は簡単です。次のようなエラーが起きたら、ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGと指定しましょう。

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Expecting left window frame boundary for function LAG((TOK_TABLE_OR_COL weight)) org.apache.hadoop.hive.ql.parse.WindowingSpec$WindowSpec@75a39d3e as _wcol0 to be unbounded. Found : 1

ここでもし、1 PRECEDINGUNBOUNDED PRECEDING に書き換えても、エラー出力にrightの差はあるが同様にエラーとなります。

SELECT
  --1つ前の値を取得するLAG関数
  LAG(weight) over(partition by pref_name, city_name order by year_month ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING) 
FROM
 ...
FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: Expecting right window frame boundary for function LAG((TOK_TABLE_OR_COL weight))