hive:カスケード・レポート・クエリー
4970 ワード
カスケード・レポート・クエリー
次のデータがある場合(ファイル名:accumulate.dat):A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
C,2015-01,10
C,2015-01,20
A,2015-02,4
A,2015-02,6
C,2015-02,30
C,2015-02,10
B,2015-02,10
B,2015-02,5
A,2015-03,14
A,2015-03,6
B,2015-03,20
B,2015-03,25
C,2015-03,10
C,2015-03,20
需要:各ユーザーの月の合計、当月に累計された合計を統計します。
------集計結果は次のように集計されます.
ユーザー
月
月合計
当月累計総額
A
2015-01
33
33
A
2015-02
10
43
A
2015-03
30
73
B
2015-01
30
30
B
2015-02
15
45
...
...
...
...
解決方法1:複雑なsql--レポートの累計
構築テーブルのマッピング:create table t_access_times(username string,month string,counts int)
row format delimited fields terminated by ',';
データのインポート(accumulate.dat):
load data local inpath '/root/accumulate.dat' into table t_access_times;
1、第一歩、まずユーザーの月額総額を求める
select username,month,sum(salary) as salary from t_access_times group by username,month
出力結果+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
2、第二歩、月額総額表を自分で接続する
select A.*,B.* FROM
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
出力結果+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
ステップ3:最終結果を得る
select auname,amonth,acnts,sum(bcnts)
from t_tmp2
group by auname,amonth,acnts;
もちろん、論理プロセス全体をSQL文に書くこともできます.select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
解決2:ウィンドウ分析関数の使用--レポートの累計
--ウィンドウ解析関数sum()over():ウィンドウ内での逐行加算が可能
0: jdbc:hive2://localhost:10000> select * from t_access_amount; +----------------------+------------------------+-------------------------+--+
| t_access_amount.uid | t_access_amount.month | t_access_amount.amount |
+----------------------+------------------------+-------------------------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| A | 2015-03 | 20 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
| B | 2015-03 | 45 |
| C | 2015-01 | 30 |
| C | 2015-02 | 40 |
| C | 2015-03 | 30 |
+----------------------+------------------------+-------------------------+--+
各人の毎月までの総額を求めるselect uid,month,amount,
sum(amount) over(partition by uid order by month rows between unbounded preceding and current row) as accumulate
from t_access_amount;
A,2015-01,5
A,2015-01,15
B,2015-01,5
A,2015-01,8
B,2015-01,25
A,2015-01,5
C,2015-01,10
C,2015-01,20
A,2015-02,4
A,2015-02,6
C,2015-02,30
C,2015-02,10
B,2015-02,10
B,2015-02,5
A,2015-03,14
A,2015-03,6
B,2015-03,20
B,2015-03,25
C,2015-03,10
C,2015-03,20
構築テーブルのマッピング:
create table t_access_times(username string,month string,counts int)
row format delimited fields terminated by ',';
データのインポート(accumulate.dat):
load data local inpath '/root/accumulate.dat' into table t_access_times;
1、第一歩、まずユーザーの月額総額を求める
select username,month,sum(salary) as salary from t_access_times group by username,month
出力結果
+-----------+----------+---------+--+
| username | month | salary |
+-----------+----------+---------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
+-----------+----------+---------+--+
2、第二歩、月額総額表を自分で接続する
select A.*,B.* FROM
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
出力結果
+-------------+----------+-----------+-------------+----------+-----------+--+
| a.username | a.month | a.salary | b.username | b.month | b.salary |
+-------------+----------+-----------+-------------+----------+-----------+--+
| A | 2015-01 | 33 | A | 2015-01 | 33 |
| A | 2015-01 | 33 | A | 2015-02 | 10 |
| A | 2015-02 | 10 | A | 2015-01 | 33 |
| A | 2015-02 | 10 | A | 2015-02 | 10 |
| B | 2015-01 | 30 | B | 2015-01 | 30 |
| B | 2015-01 | 30 | B | 2015-02 | 15 |
| B | 2015-02 | 15 | B | 2015-01 | 30 |
| B | 2015-02 | 15 | B | 2015-02 | 15 |
+-------------+----------+-----------+-------------+----------+-----------+--+
ステップ3:最終結果を得る
select auname,amonth,acnts,sum(bcnts)
from t_tmp2
group by auname,amonth,acnts;
もちろん、論理プロセス全体をSQL文に書くこともできます.
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate
from
(select username,month,sum(salary) as salary from t_access_times group by username,month) A
inner join
(select username,month,sum(salary) as salary from t_access_times group by username,month) B
on
A.username=B.username
where B.month <= A.month
group by A.username,A.month
order by A.username,A.month;
解決2:ウィンドウ分析関数の使用--レポートの累計
--ウィンドウ解析関数sum()over():ウィンドウ内での逐行加算が可能
0: jdbc:hive2://localhost:10000> select * from t_access_amount; +----------------------+------------------------+-------------------------+--+
| t_access_amount.uid | t_access_amount.month | t_access_amount.amount |
+----------------------+------------------------+-------------------------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| A | 2015-03 | 20 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
| B | 2015-03 | 45 |
| C | 2015-01 | 30 |
| C | 2015-02 | 40 |
| C | 2015-03 | 30 |
+----------------------+------------------------+-------------------------+--+
各人の毎月までの総額を求めるselect uid,month,amount,
sum(amount) over(partition by uid order by month rows between unbounded preceding and current row) as accumulate
from t_access_amount;
+----------------------+------------------------+-------------------------+--+
| t_access_amount.uid | t_access_amount.month | t_access_amount.amount |
+----------------------+------------------------+-------------------------+--+
| A | 2015-01 | 33 |
| A | 2015-02 | 10 |
| A | 2015-03 | 20 |
| B | 2015-01 | 30 |
| B | 2015-02 | 15 |
| B | 2015-03 | 45 |
| C | 2015-01 | 30 |
| C | 2015-02 | 40 |
| C | 2015-03 | 30 |
+----------------------+------------------------+-------------------------+--+
select uid,month,amount,
sum(amount) over(partition by uid order by month rows between unbounded preceding and current row) as accumulate
from t_access_amount;