python pandas集約とパケット関数


1主な内容
  • DataFrame.groupby().sum()
  • DataFrame.groupby().agg()
  • pandas.concat([DataFrame1,DataFrame2])
  • pandas.merge(DataFrame1,DataFrame2,parameters….)
  • DataFrame1.join(DataFrame 2,lsuffix=’列名on DataFrame 1’,rsuffix=’列名on DataFrame 2’)
  • ヘルプ文書の取得
  • 2例
  • 構造dataframeは以下の通り:
  •   food  food_id  number     price  user_id weather
    0       soup        4       6  1.818250        3    cold
    1       soup        8       6  1.834045        4     hot
    2    iceream        8       7  3.042422        2    cold
    3  chocolate        3       6  5.247564        4     hot
    4    iceream        6       3  4.319450        4    cold
    5    iceream        5       4  2.912291        1    cold
    6    iceream        2       7  6.118529        2    cold
    7       soup        8       4  1.394939        2     hot
    8       soup        6       8  2.921446        2     hot
    9  chocolate        2       1  3.663618        4     hot

    実装手順は次のとおりです.
    import pandas as pd
    from numpy import random
    from numpy.random import rand
    import numpy as np
    
    random.seed(42)
    
    df = pd.DataFrame({'user_id':random.randint(0,6,10),'food_id':random.randint(1,10,10),
    'weather':['cold','hot','cold','hot','cold','cold','cold','hot','hot','hot'],
    'food':['soup','soup','iceream','chocolate','iceream','iceream','iceream','soup','soup','chocolate'],
    'price':10 * rand(10),'number':random.randint(1,9,10)}) 
    
    print df

    2 groupby関数アプリケーションコード
    groupby1 = df.groupby(['user_id'])
    i = 0
    for user_id,group in groupby1:
        i = i + 1
        print "group", i , user_id
        print group

    結果
    group 1 1
          food  food_id  number     price  user_id weather
    5  iceream        5       4  2.912291        1    cold
    group 2 2
          food  food_id  number     price  user_id weather
    2  iceream        8       7  3.042422        2    cold
    6  iceream        2       7  6.118529        2    cold
    7     soup        8       4  1.394939        2     hot
    8     soup        6       8  2.921446        2     hot
    group 3 3
       food  food_id  number    price  user_id weather
    0  soup        4       6  1.81825        3    cold
    group 4 4
            food  food_id  number     price  user_id weather
    1       soup        8       6  1.834045        4     hot
    3  chocolate        3       6  5.247564        4     hot
    4    iceream        6       3  4.319450        4    cold
    9  chocolate        2       1  3.663618        4     hot

    3 groupbyとsumなどの関数を組み合わせてコードを使用
    print groupby1.sum()#   groupby               
    print groupby1['food_id','number'].sum()#   groupby               
    print df.groupby(['user_id'],as_index=False).sum()#  as_index=True
    #    sum,  mean,min,max,median,mode,std,mad  ,      
    #groupby()      help(df.groupby)   
    #     axis=0,        ,            ;axis=1,        

    結果
    output[1]:
      food_id  number      price
    user_id                            
    1              5       4   2.912291
    2             24      26  13.477336
    3              4       6   1.818250
    4             19      16  15.064678
    output[2]:
             food_id  number
    user_id                 
    1              5       4
    2             24      26
    3              4       6
    4             19      16
    output[3]:
       user_id  food_id  number      price
    0        1        5       4   2.912291
    1        2       24      26  13.477336
    2        3        4       6   1.818250
    3        4       19      16  15.064678

    4 agg関数コード
    print df.groupby(['weather','food']).agg([np.mean,np.median])

    結果
     output[4]: 
                       food_id        number            price            \
                           mean median   mean median      mean    median   
    weather food                                                           
    cold    iceream    5.250000    5.5   5.25    5.5  4.098173  3.680936   
            soup       4.000000    4.0   6.00    6.0  1.818250  1.818250   
    hot     chocolate  2.500000    2.5   3.50    3.5  4.455591  4.455591   
            soup       7.333333    8.0   6.00    6.0  2.050143  1.834045   
    
                        user_id         
                           mean median  
    weather food                        
    cold    iceream    2.250000      2  
            soup       3.000000      3  
    hot     chocolate  4.000000      4  
            soup       2.666667      2  

    5 concat()コード
    print "df :3
    "
    ,df[:3] print "df :4
    "
    ,df[6:] print pd.concat([df[:3],df[6:]],axis=0)

    結果
    df :3
          food  food_id  number     price  user_id weather
    0     soup        4       6  1.818250        3    cold
    1     soup        8       6  1.834045        4     hot
    2  iceream        8       7  3.042422        2    cold
    df :4
            food  food_id  number     price  user_id weather
    6    iceream        2       7  6.118529        2    cold
    7       soup        8       4  1.394939        2     hot
    8       soup        6       8  2.921446        2     hot
    9  chocolate        2       1  3.663618        4     hot
    df.concat
            food  food_id  number     price  user_id weather
    0       soup        4       6  1.818250        3    cold
    1       soup        8       6  1.834045        4     hot
    2    iceream        8       7  3.042422        2    cold
    6    iceream        2       7  6.118529        2    cold
    7       soup        8       4  1.394939        2     hot
    8       soup        6       8  2.921446        2     hot
    9  chocolate        2       1  3.663618        4     hot

    6 merge()とjoin()コード
    df1=pd.DataFrame({'EmpNr':[5,3,9],'Dest':['The Hague','Amsterdam','Rotterdam']})
    df2=pd.DataFrame({'EmpNr':[5,9,7],'Amount':[10,5,2.5]})
    
    print "df1
    "
    ,df1 print "df2
    "
    ,df2 print "Merge() on Key
    "
    ,pd.merge(df1,df2,on='EmpNr') print "inner join with Merge()
    "
    ,pd.merge(df1,df2,how='inner') print "Dests join tips
    "
    ,df1.join(df2,lsuffix='Dest',rsuffix='Tips')

    結果
    df1
            Dest  EmpNr
    0  The Hague      5
    1  Amsterdam      3
    2  Rotterdam      9
    df2
       Amount  EmpNr
    0    10.0      5
    1     5.0      9
    2     2.5      7
    Merge() on Key
            Dest  EmpNr  Amount
    0  The Hague      5    10.0
    1  Rotterdam      9     5.0
    inner join with Merge()
            Dest  EmpNr  Amount
    0  The Hague      5    10.0
    1  Rotterdam      9     5.0
    Dests join tips
            Dest  EmpNrDest  Amount  EmpNrTips
    0  The Hague          5    10.0          5
    1  Amsterdam          3     5.0          9
    2  Rotterdam          9     2.5          7

    6ヘルプドキュメントの入手方法
    1.help(pd.concat)
    2.dir(pd.concat)
    3.pd.concat?
    ...

    7参考文献pythonによるデータ分析ノートpythonデータ分析,Ivan Idris著