sql:hive:mysql:group byとdistinct脱重

4502 ワード

mysql:
ソースデータ(mysql)
id    userid    subject    score
1	001	  	90.0
2	001	  	92.0
3	001	  	80.0
4	002	  	88.0
5	002	  	90.0
6	002	  	75.5
7	003	  	70.0
8	003	  	85.0
9	003	  	90.0
10	003	  	82.0
11	001	  	91.0
12	001	  	92.0

注意:mysqlは後ろのgroupbyに2つのフィールドがありますが、前のselectの後に複数のフィールドが付いていて、デフォルトでは最初のレコードが表示されます.
しかしhiveの後ろのgroupbyのいくつかのフィールドは、前にいくつかのフィールドしか検出できず、固定されています.
group by:
 select userid,subject,score from tb_score
 group by userid,subject
001	  	92.0
001	  	80.0
001	  	90.0
002	  	90.0
002	  	75.5
002	  	88.0
003	  	82.0
003	  	85.0
003	  	90.0
003	  	70.0

distinct:
select distinct userid,subject from tb_score
001	  
001	  
001	  
002	  
002	  
002	  
003	  
003	  
003	  
003	  

groupbyとは異なりmysqlではgroupbyの2つのフィールドがありますが、3つのフィールドを調べることができますが、distinctの2つのフィールドは、この2つのフィールドしかクエリーできません.
ではdistinctにかっこをつけますか?
select distinct(userid) u,subject from tb_score
u    subject
001	  
001	  
001	  
002	  
002	  
002	  
003	  
003	  
003	  
003	  
 select distinct(userid) u,subject,score from tb_score
u    subject    score
001	  	90.0
001	  	92.0
001	  	80.0
002	  	88.0
002	  	90.0
002	  	75.5
003	  	70.0
003	  	85.0
003	  	90.0
003	  	82.0
001	  	91.0
001	  	92.0
select distinct(userid,subject),score from tb_score 
select distinct(userid,subject,score) from tb_score
       :
 [Error Code: 1241, SQL State: 21000]  Operand should contain 1 column(s)

集約テスト:
 select distinct userid,subject ,sum(score)  from tb_score

  :
001	  	1025.5
 select  userid,subject ,sum(score)  from tb_score group by userid,subject

  :
001	  	92.0
001	  	80.0
001	  	273.0
002	  	90.0
002	  	75.5
002	  	88.0
003	  	82.0
003	  	85.0
003	  	90.0
003	  	70.0

 
hive:
ソースデータ(hive)
C1     C2      C3      C4
A	B	D	D
C	D	A	B
C	D	A	B
A	B	C	C
B	B	C	C
B	B	D	D

group by:
 select C1,C2 from tmp.distinctt group by C1,C2 
C1    C2
B	B
A	B
C	D

distinct:
 select distinct c1,c2,c3,c4 from tmp.distinctt
C1     C2      C3      C4
B	B	D	D
B	B	C	C
A	B	D	D
A	B	C	C
C	D	A	B
select distinct c1,c2 from tmp.distinctt
   :
select distinct(c1),c2 from tmp.distinctt
C1    C2
B	B
A	B
C	D

ではdistinctに括弧をつけますか?
select distinct(c1,c2) from tmp.distinctt
select distinct(c1,c2),c3 from tmp.distinctt
         :

SQL    [40000] [42000]: Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
  Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
  Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'

集約テスト:
 select distinct c1,c2,sum(case(c4 as int)) from tmp.distinctt

エラー:
SQL    [40000] [42000]: Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
  Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
  Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification