Hive関数メモ--共通関数解析

4674 ワード

前言
本稿では不定期更新を行い,作業中に接触して使用したHive関数を記録する.
一般的な関数

get_json_object(string json_string,string path)この関数の最初のパラメータはjsonオブジェクト変数であり、2番目のパラメータは$でjson変数の識別を表し、その後使用する.または[]オブジェクトまたは配列を読み込む

select get_json_object(pricecount,'$.buyoutRoomRequest') new_id,pricecount
  from table_sample a
 where d='2018-08-31' limit 100

json_tuple(string json_string,string k1,string k2...) この関数の最初のパラメータはjsonオブジェクト変数であり、その後のパラメータは不定長パラメータであり、キーk 1,k 2...のセットである.戻り値はget_よりもメタグループです.json_objectは、1回の呼び出しで複数のキー値

を入力できるため効率的である.

select m.*,n.pricecount
  from (select 
              from table_sample a 
            where d='2018-08-31' limit 100)n
  lateral view json_tuple(pricecount,'paymentType','complete') m as f1,f2

split(str,regex) Splits str arround occourances that match regex.この関数の最初のパラメータは文字列であり、2番目のパラメータは設定された区切り記号であり、2番目のパラメータによって1番目のパラメータを分割し、1つの配列

を返す.

select split('123,3455,2568',',')
select split('sfas:sdfs:sf',':')

explode() explode takes an array (or a map) as an input and outputs the elements of the array (map) as separate rows;この関数は、入力パラメータの各要素を独立した行に分割して記録するarrayまたはmapタイプのパラメータを受信します.

select explode(split('123,3455,2568',','))

lateral view() lateral view udtf(expression) tableAlias as columnAlias (',' columnAlias);Lateral Viewは、一般に、explode()などのユーザー定義テーブル生成関数と組み合わせて使用されます.UDTFは入力行ごとに0個以上の出力行を生成し、Lateral viewはまずUDTFをベーステーブルの各行に適用し、その後、結果出力行を入力行に接続し、提供されたテーブル別名を持つ仮想テーブルを形成した.

select j.nf,p.* from (
select m.*,n.pricecount
  from (select * from ods_htl_htlinfogoverndb.buyout_appraise a where d = '${zdt.format("yyyy-MM-dd")}' limit 100)n
 lateral view json_tuple(pricecount,'paymentType','complete') m as f1,f2 )p
 lateral view explode(split(regexp_replace(p.f1,'\\[|\\]',''),',')) j as nf

from_unixtime(int/bigint timestamp,string format)

この関数の1番目のパラメータはint/bigintタイプの10ビットタイムスタンプ変数を受信し、ミリ秒の13ビットタイムスタンプを持って切り取る必要があります.2番目のパラメータは返される日付のフォーマットで、設定しなくてもいいです.デフォルトはフォーマットです.yyyy-MM-dd HH:mm:ss

select from_unixtime(1000000000);
select from_unixtime(1000000000,'yyyy-MM-dd HH');

unix_timestamp(string date,string format)この関数には2つのパラメータがありますが、2つのパラメータはオプションのパラメータで、具体的な違いは以下の通りです.unix_timestamp():パラメータなしで現在のタイムスタンプ、current_を返します.timestamp()には同じ機能unix_があります.timestamp(string date):最初のパラメータのみを持つ場合、dateに対応するタイムスタンプを返します.dateのフォーマットはyyyy-MM-dd HH:mm:ss unix_でなければなりません.timestamp(string date,string format):dateに対応するタイムスタンプを返します.dateフォーマットはformatによって

が指定されます.

select unix_timestamp();
select unix_timestamp('2018-09-05 10:24:36');
select unix_timestamp('2018-09-05 10','yyyy-MM-dd HH');

str_to_map(String text,String delimiter 1,String delimiter 2)は、2つの区切り文字を使用してテキストをキー値ペアに分割します.Delimiter 1はテキストをk-vペアに分割し、Delimiter 2は各k-vペアを分割する.delimiter 1のデフォルト値は',',delimiter 2のデフォルト値は'='.

select str_to_map('abc:11&bcd:22', '&', ':')

collect_set()この関数は基本的なデータ型のみを受け入れ、主な役割はあるフィールドの値を再要約することであり、戻り値はarrayタイプフィールド

である.

with t as (
select 1 id,123 value
  union all
select 1 id,234 value
  union all
select 2 id,124 value
)
select t.id,collect_set(t.value)
  from t
 group by t.id

collect_List()この関数機能はcollect_に等しいset、唯一の違いはcollect_ですsetは重複要素を除去しますcollect_listは重複要素を除去せず、例sqlは以下の

である.

with t as (
select 1 id,123 value
  union all
select 1 id,234 value
  union all
select 2 id,124 value
  union all
select 2 id,124 value
)
select t.id,collect_set(t.value),collect_list(t.value)
  from t
 group by t.id

concat_ws(seperator,String s1,String s2...) この関数は、分割子seperatorによって文字列をつなぎ合わせ、通常groupbyとcollect_を組み合わせます.set使用

array_contains(Array,value)この関数は、Arrayに要素valueが含まれているか否かを判断するために使用され、戻り値はboolean

である.

select array_contains(array(1,2,3,4,5),3)
true

percentile(expr,pc)この関数は、パラメータexprのパーセント数を計算するために使用されます.expr:フィールドタイプはINTでなければなりません.そうしないと、エラーが発生します.PC:パーセンテージ、

に数値形式で入力

percentile_approx(expr,pc,[nb])この関数もパラメータexprのパーセント数を計算するために使用されますが、データ型の要求はpercentileが厳しくなく、この関数の数値が類似しているタイプでもいいです.PC:パーセンテージビット数は、配列形式で入力できるため、指定された複数のパーセンテージビット数を一度に表示できます.[nb]:メモリ消費の精度を制御し、

を選択

regexp_replaceこの関数は文字列の置換に使用されます.次の例では、特定の特殊文字を置換するために使用されます.

select regexp_replace('string','
|\t|\r|','')

未経験エンジニアが知っておくべき事（初心者向け）php/laravel

自己紹介