hive hql一般クエリー練習

15124 ワード

一般クエリー
1、
hiveのhqlを使用してユーザーの部署をクエリーする

dpt 
dpt_id dpt_name
1   
2   

user_dpt 
user_id dpt_id
1 1
2 1
3 2
4 2
5 3



1	1	  
2	1	  
3	2	  
4	2	  
5	3

SELECT t1.user_id 
,t1.dpt_id 
,nvl(t2.dpt_name,"    ")
FROM dpt 
LEFT JOIN user_dpt 
ON dpt.dpt_id = user_dpt.user_id
;

2、
学期ごとのカリキュラムの最高得点記録を調べる

course_score :
id,name,course,score,term
1,zhangsan,  ,80,2015
2,lisi,  ,90,2016
3,lisi,  ,70,2016
4,wangwu,  ,80,2017
5,zhangsan,  ,85,2015
6,zhangsan,  ,80,2015

  sql      ，       ，          : 
1、              (    5   ) 
2、           90             (    5   )

--(1)
SELECT t.* 
FROM course_score t, 
( 
SELECT course 
,term 
,MAX(score) score 
FROM course_score 
GROUP BY course 
,term 
) tmp 
WHERE tmp.course = t.course 
AND tmp.term = t.term 
AND tmp.score = t.score 
;
--(2)
SELECT t.* 
FROM course_score t, 
( 
SELECT id 
FROM course_score 
WHERE course = "  " 
AND score > 90 
) tmp 
WHERE t.id = tmp.id 
AND t.cource = "  " 
;

3、
学生の基本情報、カリキュラム情報、学生のカリキュラムと成績を格納するためのデータベース表を設計し、sql文を与え、平均成績が85より大きいすべての学生を検索します.

stu_1
id,name,age,addr
1,zs1,22,bj
2,zs2,22,bj
3,zs3,22,bj
4,zs4,22,bj
5,zs5,22,bj

course_1
cid,cname
1,  
2,  
3,  
4,  
5,  

course_sc
id,cid,score
1,1,87
1,2,92
1,3,69
2,2,83
2,3,92
2,4,87
2,5,83

SELECT * 
FROM stu_1, 
( 
SELECT id 
,AVG(score) a 
FROM course_sc 
GROUP BY id 
HAVING AVG(score) > 85 
) tmp 
WHERE stu_1.id = tmp.id 
;

4、
各チャネルの注文ユーザー数、注文総額

oid,uid,amount,channel,otime
1,100,19,a,2019-08-06 19:00:00
2,101,19,b,2019-08-06 19:00:01
3,100,19,a,2019-08-05 19:00:00
4,101,19,b,2019-08-05 19:00:01
5,102,19,a,2019-08-06 19:00:00
6,102,19,a,2019-08-06 19:00:01

SELECT channel 
,COUNT(distinct user_id) 
,SUM(amount) 
FROM order 
GROUP BY channel 
ON dt = "2019-08-06"
;

5、
登録して読むユーザーの数、すでに読んでいる書籍の数、その他

 A(   ):
ds user_id
2019-08-06 1
2019-08-06 2
2019-08-06 3
2019-08-06 4

 B(   ):
ds user_id read_num
2019-08-06 1 2
2019-08-06 2 3
2019-08-06 3 6

 C(   ):
ds user_id price
2019-08-06 1 55.6
2019-08-06 2 55.8

       ，   hive hql        :
(1)、                ，        
(2)、        ，          
(3)、        ，

--(1)
SELECT COUNT(DISTINCT b.user_id) 
,SUM(b.read_num) 
FROM tableb b 
LEFT JOIN tablea a 
ON a.ds = b.ds AND a.user_id = b.user_id 
GROUP BY ds
;
--(2)
SELECT c.ds 
,COUNT(DISTINCT c.user_id) 
FROM tablec c 
LEFT JOIN tableb b 
ON b.user_id = c.user_id AND b.ds = c.ds 
LEFT JOIN tablea a 
ON a.user_id = c.user_id AND a.ds = c.ds 
WHERE c.user_id is null 
GROUP BY c.ds
;
--(3)
SELECT c.user_id 
,c.price 
FROM tablec c join 
ON tablea a 
ON a.ds = c.ds AND a.user_id = c.user_id
;

6、
高消費者レポート

district :
disid   disname 
1     
2     

city : 
cityid  disid 
1   1 
2   1
3   2
4   2
5   2

order ： 
oid userid  cityid  amount 
1   1   1   1223.9 
2   1   1   9999.9
3   2   2   2322
4   2   2   8909
5   2   3   6789
6   2   3   798
7   3   4   56786
8   4   5   78890

           1W   ，  hive hql      :

SELECT b.disid 
,c.disname 
,a.userid 
,SUM(a.amount) 
FROM order_29 a 
JOIN city b 
ON a.cityid=b.cityid 
JOIN district c 
ON b.disid = c.disid 
GROUP BY b.disid 
,c.disname 
,a.userid
;

7、
hiveのhqlを使って商品3を買ったユーザーと昨日の消費を実現する

  
orderid userid  productid   price   timestamp   date
123,00101,3,200,1535945356,2019-08-28 
124,00100,1,200,1535945356,2019-08-28 
125,00101,3,200,1535945356,2019-08-29
126,00101,2,200,1535945356,2019-08-29
127,00102,5,200,1535945356,2019-08-29 
128,00103,3,200,1535945356,2019-08-29 
129,00103,3,200,1535945356,2019-08-29

SELECT
userid,
cur+lastday
from
(SELECT userid 
,sum(case when dt=current_date then price else 0 end) cur,
sum(case when dt=date_sub(current_date,1) then price else 0 end) lastday
FROM 
( 
SELECT userid 
,price 
,dt 
FROM buy 
WHERE productid=3 
AND (dt = current_date or dt = date_sub(current_date,1)) 
) tmp
group by userid) tt
where cur != 0 and lastday != 0
;

Babelでカスタムpresetを作る

FlowType 最小構成のビルド環境 Babel WebPack