hiveではunion allとmulti insertを合理的に使う

3337 ワード

もっと読む
同じ表のunion allは多重insertよりもずっと速いです.その原因はhive自身がこのunion allを最適化したことがあります.つまり一回のソース表だけをスキャンします.http://www.apacheserver.net/How-is-Union-All-optimized-in-Hive-at229466.htm 多重insertも一回だけスキャンしますが、insertを複数のパーティションにするために、他のことをたくさんしましたので、消費時間が非常に長いです.みんなが開発する時多く測って、多く試してください.
 
lxw_test 3 12億ぐらいの記録数ユニオンall:7分ぐらいかかります.
 
create table lxw_test5 as 
select type,popt_id,login_date 
from (
        select 'm3_login' as type,popt_id,login_date  
        from lxw_test3 
        where login_date>='2012-02-01' and login_date='2012-05-01' and login_date<='2012-05-09' 
        union all 
        select 'm3_g_login' as type,popt_id,login_date 
        from lxw_test3 
        where login_date>='2012-02-01' and login_date='2012-02-01' and login_date='2012-02-01' and login_date='2012-02-01' and login_date='2012-05-01' and login_date<='2012-05-09' and apptypeid='1' 
        union all 
        select 'mn_l_login' as type,popt_id,login_date 
        from lxw_test3 
        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2' 
        union all 
        select 'mn_s_login' as type,popt_id,login_date 
        from lxw_test3 
        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3' 
        union all 
        select 'mn_o_login' as type,popt_id,login_date 
        from lxw_test3 
        where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4' 
) x
 
多重insertは25分ぐらいかかります.
 
from lxw_test3 
insert overwrite table lxw_test6 partition (flag = '1') 
select 'm3_login' as type,popt_id,login_date  
where login_date>='2012-02-01' and login_date='2012-05-01' and login_date<='2012-05-09' 
insert overwrite table lxw_test6 partition (flag = '3') 
select 'm3_g_login' as type,popt_id,login_date 
where login_date>='2012-02-01' and login_date='2012-02-01' and login_date='2012-02-01' and login_date='2012-02-01' and login_date='2012-05-01' and login_date<='2012-05-09' and apptypeid='1' 
insert overwrite table lxw_test6 partition (flag = '8') 
select 'mn_l_login' as type,popt_id,login_date 
where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='2' 
insert overwrite table lxw_test6 partition (flag = '9') 
select 'mn_s_login' as type,popt_id,login_date 
where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='3' 
insert overwrite table lxw_test6 partition (flag = '10') 
select 'mn_o_login' as type,popt_id,login_date 
where login_date>='2012-05-01' and login_date<='2012-05-09' and apptypeid='4'
もっと大きなデータHadoop、Spark、Hiveの記事があります.私のブログに注目してください.