sql:hive:mysql:group byとdistinct脱重

4502 ワード

mysql:
ソースデータ(mysql)

id    userid    subject    score
1	001	  	90.0
2	001	  	92.0
3	001	  	80.0
4	002	  	88.0
5	002	  	90.0
6	002	  	75.5
7	003	  	70.0
8	003	  	85.0
9	003	  	90.0
10	003	  	82.0
11	001	  	91.0
12	001	  	92.0

注意:mysqlは後ろのgroupbyに2つのフィールドがありますが、前のselectの後に複数のフィールドが付いていて、デフォルトでは最初のレコードが表示されます.
しかしhiveの後ろのgroupbyのいくつかのフィールドは、前にいくつかのフィールドしか検出できず、固定されています.
group by:

 select userid,subject,score from tb_score
 group by userid,subject

001	  	92.0
001	  	80.0
001	  	90.0
002	  	90.0
002	  	75.5
002	  	88.0
003	  	82.0
003	  	85.0
003	  	90.0
003	  	70.0

distinct:

select distinct userid,subject from tb_score

groupbyとは異なりmysqlではgroupbyの2つのフィールドがありますが、3つのフィールドを調べることができますが、distinctの2つのフィールドは、この2つのフィールドしかクエリーできません.
ではdistinctにかっこをつけますか?

select distinct(userid) u,subject from tb_score

u    subject
001	  
001	  
001	  
002	  
002	  
002	  
003	  
003	  
003	  
003

 select distinct(userid) u,subject,score from tb_score

u    subject    score
001	  	90.0
001	  	92.0
001	  	80.0
002	  	88.0
002	  	90.0
002	  	75.5
003	  	70.0
003	  	85.0
003	  	90.0
003	  	82.0
001	  	91.0
001	  	92.0

select distinct(userid,subject),score from tb_score 
select distinct(userid,subject,score) from tb_score
       ：
 [Error Code: 1241, SQL State: 21000]  Operand should contain 1 column(s)

集約テスト:

 select distinct userid,subject ,sum(score)  from tb_score

  ：
001	  	1025.5

 select  userid,subject ,sum(score)  from tb_score group by userid,subject

  ：
001	  	92.0
001	  	80.0
001	  	273.0
002	  	90.0
002	  	75.5
002	  	88.0
003	  	82.0
003	  	85.0
003	  	90.0
003	  	70.0

hive:
ソースデータ(hive)

C1     C2      C3      C4
A	B	D	D
C	D	A	B
C	D	A	B
A	B	C	C
B	B	C	C
B	B	D	D

group by:

 select C1,C2 from tmp.distinctt group by C1,C2

C1    C2
B	B
A	B
C	D

distinct:

 select distinct c1,c2,c3,c4 from tmp.distinctt

C1     C2      C3      C4
B	B	D	D
B	B	C	C
A	B	D	D
A	B	C	C
C	D	A	B

select distinct c1,c2 from tmp.distinctt
   ：
select distinct(c1),c2 from tmp.distinctt

C1    C2
B	B
A	B
C	D

ではdistinctに括弧をつけますか?

select distinct(c1,c2) from tmp.distinctt
select distinct(c1,c2),c3 from tmp.distinctt
         ：

SQL    [40000] [42000]: Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
  Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
  Error while compiling statement: FAILED: ParseException line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:19 missing ) at ',' near 'c2'
line 1:22 missing EOF at ')' near 'c2'

集約テスト:

 select distinct c1,c2,sum(case(c4 as int)) from tmp.distinctt

エラー:

SQL    [40000] [42000]: Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
  Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
  Error while compiling statement: FAILED: ParseException line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification
    org.apache.hadoop.hive.ql.parse.ParseException:line 1:35 mismatched input 'as' expecting ) near 'c4' in expression specification

反応ネイティブ

あなたがすぐに採用することができる単純であるが、強力なコーディング慣行:クラスと機能