python pandas集約とパケット関数
1主な内容 DataFrame.groupby().sum() DataFrame.groupby().agg() pandas.concat([DataFrame1,DataFrame2]) pandas.merge(DataFrame1,DataFrame2,parameters….) DataFrame1.join(DataFrame 2,lsuffix=’列名on DataFrame 1’,rsuffix=’列名on DataFrame 2’) ヘルプ文書の取得 2例構造dataframeは以下の通り:
実装手順は次のとおりです.
2 groupby関数アプリケーションコード
結果
3 groupbyとsumなどの関数を組み合わせてコードを使用
結果
4 agg関数コード
結果
5 concat()コード
結果
6 merge()とjoin()コード
結果
6ヘルプドキュメントの入手方法
7参考文献pythonによるデータ分析ノートpythonデータ分析,Ivan Idris著
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
3 chocolate 3 6 5.247564 4 hot
4 iceream 6 3 4.319450 4 cold
5 iceream 5 4 2.912291 1 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
実装手順は次のとおりです.
import pandas as pd
from numpy import random
from numpy.random import rand
import numpy as np
random.seed(42)
df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),
'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],
'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],
'price':10 * rand(10),'number':random.randint(1,9,10)})
print df
2 groupby関数アプリケーションコード
groupby1 = df.groupby(['user_id'])
i = 0
for user_id,group in groupby1:
i = i + 1
print "group", i , user_id
print group
結果
group 1 1
food food_id number price user_id weather
5 iceream 5 4 2.912291 1 cold
group 2 2
food food_id number price user_id weather
2 iceream 8 7 3.042422 2 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
group 3 3
food food_id number price user_id weather
0 soup 4 6 1.81825 3 cold
group 4 4
food food_id number price user_id weather
1 soup 8 6 1.834045 4 hot
3 chocolate 3 6 5.247564 4 hot
4 iceream 6 3 4.319450 4 cold
9 chocolate 2 1 3.663618 4 hot
3 groupbyとsumなどの関数を組み合わせてコードを使用
print groupby1.sum()# groupby
print groupby1['food_id','number'].sum()# groupby
print df.groupby(['user_id'],as_index=False).sum()# as_index=True
# sum, mean,min,max,median,mode,std,mad ,
#groupby() help(df.groupby)
# axis=0, , ;axis=1,
結果
output[1]:
food_id number price
user_id
1 5 4 2.912291
2 24 26 13.477336
3 4 6 1.818250
4 19 16 15.064678
output[2]:
food_id number
user_id
1 5 4
2 24 26
3 4 6
4 19 16
output[3]:
user_id food_id number price
0 1 5 4 2.912291
1 2 24 26 13.477336
2 3 4 6 1.818250
3 4 19 16 15.064678
4 agg関数コード
print df.groupby(['weather','food']).agg([np.mean,np.median])
結果
output[4]:
food_id number price \
mean median mean median mean median
weather food
cold iceream 5.250000 5.5 5.25 5.5 4.098173 3.680936
soup 4.000000 4.0 6.00 6.0 1.818250 1.818250
hot chocolate 2.500000 2.5 3.50 3.5 4.455591 4.455591
soup 7.333333 8.0 6.00 6.0 2.050143 1.834045
user_id
mean median
weather food
cold iceream 2.250000 2
soup 3.000000 3
hot chocolate 4.000000 4
soup 2.666667 2
5 concat()コード
print "df :3
",df[:3]
print "df :4
",df[6:]
print pd.concat([df[:3],df[6:]],axis=0)
結果
df :3
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
df :4
food food_id number price user_id weather
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
df.concat
food food_id number price user_id weather
0 soup 4 6 1.818250 3 cold
1 soup 8 6 1.834045 4 hot
2 iceream 8 7 3.042422 2 cold
6 iceream 2 7 6.118529 2 cold
7 soup 8 4 1.394939 2 hot
8 soup 6 8 2.921446 2 hot
9 chocolate 2 1 3.663618 4 hot
6 merge()とjoin()コード
df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})
df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})
print "df1
",df1
print "df2
",df2
print "Merge() on Key
",pd.merge(df1,df2,on='EmpNr')
print "inner join with Merge()
",pd.merge(df1,df2,how='inner')
print "Dests join tips
",df1.join(df2,lsuffix='Dest',rsuffix='Tips')
結果
df1
Dest EmpNr
0 The Hague 5
1 Amsterdam 3
2 Rotterdam 9
df2
Amount EmpNr
0 10.0 5
1 5.0 9
2 2.5 7
Merge() on Key
Dest EmpNr Amount
0 The Hague 5 10.0
1 Rotterdam 9 5.0
inner join with Merge()
Dest EmpNr Amount
0 The Hague 5 10.0
1 Rotterdam 9 5.0
Dests join tips
Dest EmpNrDest Amount EmpNrTips
0 The Hague 5 10.0 5
1 Amsterdam 3 5.0 9
2 Rotterdam 9 2.5 7
6ヘルプドキュメントの入手方法
1.help(pd.concat)
2.dir(pd.concat)
3.pd.concat?
...
7参考文献pythonによるデータ分析ノートpythonデータ分析,Ivan Idris著