Python_Pandasベース

88845 ワード

Python_Pandasベース
By: ?
参考ブログ1参考ブログ_2参考ブログ_3
  • Pandasは、データ分析タスクを解決するために作成されたPythonのデータ分析パッケージです.
  • Pandasは、大量のライブラリと標準データモデルを組み込み、データセットを効率的に操作するために必要なツールを提供します.
  • Pandasは、データを迅速かつ容易に処理できる多くの関数と方法を提供しています.
  • Pandasは辞書形式で、NumPyに基づいて作成され、NumPyを中心としたアプリケーションをより簡単にする
  • Pandas取付
    pip3 install pandas
    

    Pandas導入
    import pandas as pd #      pandas   pd  
    

    データ構造
  • series
  • DataFrame

  • Series
    import numpy as np
    import pandas as pd
    s=pd.Series([1,2,3,np.nan,5,6])
    print(s)#          
    
    0    1.0
    1    2.0
    2    3.0
    3    NaN
    4    5.0
    5    6.0
    dtype: float64
    

    DataFrame
    dates=pd.date_range('20180310',periods=6)
    df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=['A','B','C','D'])#  6 4   
    print(df)#  6 4    
    print(df['B'])
    print("----------------
    ----------------"
    ) # DataFrame df_1=pd.DataFrame({'A' : 1., 'B' : pd.Timestamp('20180310'), 'C' : pd.Series(1,index=list(range(4)),dtype='float32'), 'D' : np.array([3] * 4,dtype='int32'), 'E' : pd.Categorical(["test","train","test","train"]), 'F' : 'foo' }) print(df_1) print(df_1.dtypes) print(df_1.index)# #Int64Index([0, 1, 2, 3], dtype='int64') print(df_1.columns)# print("----------------
    ----------------"
    ) #Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object') print(df_1.values)# print(df_1.describe())# print(df_1.T)# print("----------------
    ----------------"
    ) print(df_1.sort_index(axis=1, ascending=False))#axis 1 ABCDEFG ascending print(df_1.sort_values(by='E'))#
                       A         B         C         D
    2018-03-10  0.872767  2.188739  0.766781 -0.001429
    2018-03-11  0.218740 -0.556263 -0.047700  0.470347
    2018-03-12 -0.816785  0.479690  1.722349  1.116260
    2018-03-13  0.988138 -0.025760 -0.971384 -0.558211
    2018-03-14 -0.581776  1.021027 -1.280569  1.022587
    2018-03-15  0.061455 -1.647589 -1.568288 -0.467407
    2018-03-10    2.188739
    2018-03-11   -0.556263
    2018-03-12    0.479690
    2018-03-13   -0.025760
    2018-03-14    1.021027
    2018-03-15   -1.647589
    Freq: D, Name: B, dtype: float64
    ----------------
    ----------------
         A          B    C  D      E    F
    0  1.0 2018-03-10  1.0  3   test  foo
    1  1.0 2018-03-10  1.0  3  train  foo
    2  1.0 2018-03-10  1.0  3   test  foo
    3  1.0 2018-03-10  1.0  3  train  foo
    A           float64
    B    datetime64[ns]
    C           float32
    D             int32
    E          category
    F            object
    dtype: object
    Int64Index([0, 1, 2, 3], dtype='int64')
    Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
    ----------------
    ----------------
    [[1.0 Timestamp('2018-03-10 00:00:00') 1.0 3 'test' 'foo']
     [1.0 Timestamp('2018-03-10 00:00:00') 1.0 3 'train' 'foo']
     [1.0 Timestamp('2018-03-10 00:00:00') 1.0 3 'test' 'foo']
     [1.0 Timestamp('2018-03-10 00:00:00') 1.0 3 'train' 'foo']]
             A    C    D
    count  4.0  4.0  4.0
    mean   1.0  1.0  3.0
    std    0.0  0.0  0.0
    min    1.0  1.0  3.0
    25%    1.0  1.0  3.0
    50%    1.0  1.0  3.0
    75%    1.0  1.0  3.0
    max    1.0  1.0  3.0
                         0                    1                    2  \
    A                    1                    1                    1   
    B  2018-03-10 00:00:00  2018-03-10 00:00:00  2018-03-10 00:00:00   
    C                    1                    1                    1   
    D                    3                    3                    3   
    E                 test                train                 test   
    F                  foo                  foo                  foo   
    
                         3  
    A                    1  
    B  2018-03-10 00:00:00  
    C                    1  
    D                    3  
    E                train  
    F                  foo  
    ----------------
    ----------------
         F      E  D    C          B    A
    0  foo   test  3  1.0 2018-03-10  1.0
    1  foo  train  3  1.0 2018-03-10  1.0
    2  foo   test  3  1.0 2018-03-10  1.0
    3  foo  train  3  1.0 2018-03-10  1.0
         A          B    C  D      E    F
    0  1.0 2018-03-10  1.0  3   test  foo
    2  1.0 2018-03-10  1.0  3   test  foo
    1  1.0 2018-03-10  1.0  3  train  foo
    3  1.0 2018-03-10  1.0  3  train  foo
    

    Pandas選択データ
  • 特定列のデータ
  • を選択する.
  • 特定行のデータ
  • を選択する.
  • 特定行and列のデータ
  • を選択する.
  • シーケンスiloc行番号に従って選択データ
  • 条件判断による選別
  • マルチインデックス
  • df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                       index = ['one','two','three','four'],
                       columns = ['a','b','c','d'])
    df
    

    a
    b
    c
    d
    one
    73.506341
    75.662735
    74.675325
    7.697207
    two
    73.055825
    83.222481
    4.777599
    82.534340
    three
    89.156683
    85.001712
    47.443443
    73.379189
    four
    95.648043
    64.162408
    26.731916
    73.839172
    特定の列のデータの選択
    #  
    print(df["a"])
    print("----------------
    ----------------"
    ) # print(df[["a","b"]]) print("----------------
    ----------------"
    ) # _ print(df.loc[:,"b":"d"])
    one      73.506341
    two      73.055825
    three    89.156683
    four     95.648043
    Name: a, dtype: float64
    ----------------
    ----------------
                   a          b
    one    73.506341  75.662735
    two    73.055825  83.222481
    three  89.156683  85.001712
    four   95.648043  64.162408
    ----------------
    ----------------
                   b          c          d
    one    75.662735  74.675325   7.697207
    two    83.222481   4.777599  82.534340
    three  85.001712  47.443443  73.379189
    four   64.162408  26.731916  73.839172
    

    特定の行のデータの選択
    #  
    print(df.loc["one"])
    print("----------------
    ----------------"
    ) # print(df.loc[["one","two"]]) print("----------------
    ----------------"
    ) # _ print(df[0:3]) print(df['one':'three'])
    a    73.506341
    b    75.662735
    c    74.675325
    d     7.697207
    Name: one, dtype: float64
    ----------------
    ----------------
                 a          b          c          d
    one  73.506341  75.662735  74.675325   7.697207
    two  73.055825  83.222481   4.777599  82.534340
    ----------------
    ----------------
                   a          b          c          d
    one    73.506341  75.662735  74.675325   7.697207
    two    73.055825  83.222481   4.777599  82.534340
    three  89.156683  85.001712  47.443443  73.379189
                   a          b          c          d
    one    73.506341  75.662735  74.675325   7.697207
    two    73.055825  83.222481   4.777599  82.534340
    three  89.156683  85.001712  47.443443  73.379189
    

    特定の行and列のデータの選択
    #  and  
    print(df.loc["one","a"])
    print("----------------
    ----------------"
    ) # and print(df.loc['one', ['a','c']]) print(df.loc[['one','three'],["a","b","c"]]) print("----------------
    ----------------"
    ) # and _ print(df.loc["one":"three","b":"c"])
    73.50634055308014
    ----------------
    ----------------
    a    73.506341
    c    74.675325
    Name: one, dtype: float64
                   a          b          c
    one    73.506341  75.662735  74.675325
    three  89.156683  85.001712  47.443443
    ----------------
    ----------------
                   b          c
    one    75.662735  74.675325
    two    83.222481   4.777599
    three  85.001712  47.443443
    

    シーケンスiloc-行番号に基づいてデータを選択
    #  
    print(df.iloc[0])
    print("----------------
    ----------------"
    ) # print(df.iloc[[0,3]]) print("----------------
    ----------------"
    ) # _ print(df.iloc[1:3]) print("----------------
    ----------------"
    ) # and print(df.iloc[3,1])# print("----------------
    ----------------"
    ) # and print(df.iloc[[1,2,3],[0,2]])# , print("----------------
    ----------------"
    ) # print(df.iloc[2:4,0:2]) #
    a    73.506341
    b    75.662735
    c    74.675325
    d     7.697207
    Name: one, dtype: float64
    ----------------
    ----------------
                  a          b          c          d
    one   73.506341  75.662735  74.675325   7.697207
    four  95.648043  64.162408  26.731916  73.839172
    ----------------
    ----------------
                   a          b          c          d
    two    73.055825  83.222481   4.777599  82.534340
    three  89.156683  85.001712  47.443443  73.379189
    ----------------
    ----------------
    64.1624082303679
    ----------------
    ----------------
                   a          c
    two    73.055825   4.777599
    three  89.156683  47.443443
    four   95.648043  26.731916
    ----------------
    ----------------
                   a          b
    three  89.156683  85.001712
    four   95.648043  64.162408
    

    条件判断による選別
    #    
    print(df[df["a"] > 0])#   df.A  0          
    print("----------------
    ----------------"
    ) # print(df[df[["a","b"]]>0])
                   a          b          c          d
    one    73.506341  75.662735  74.675325   7.697207
    two    73.055825  83.222481   4.777599  82.534340
    three  89.156683  85.001712  47.443443  73.379189
    four   95.648043  64.162408  26.731916  73.839172
                   a          b   c   d
    one    73.506341  75.662735 NaN NaN
    two    73.055825  83.222481 NaN NaN
    three  89.156683  85.001712 NaN NaN
    four   95.648043  64.162408 NaN NaN
    

    マルチインデックス
    print(df['a'].loc[['one','three']])   #   a  one,three 
    print("----------------
    ----------------"
    ) print(df[['b','c','d']].iloc[::2]) # b,c,d one,three print("----------------
    ----------------"
    ) print(df[df['a'] < 50].iloc[:2]) # print("----------------
    ----------------"
    ) print(df[df < 50][['a','b']])
    one      73.506341
    three    89.156683
    Name: a, dtype: float64
    ----------------
    ----------------
                   b          c          d
    one    75.662735  74.675325   7.697207
    three  85.001712  47.443443  73.379189
    ----------------
    ----------------
    Empty DataFrame
    Columns: [a, b, c, d]
    Index: []
    ----------------
    ----------------
            a   b
    one   NaN NaN
    two   NaN NaN
    three NaN NaN
    four  NaN NaN
    

    Pandas設定データ
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    print(df)
    '''
                 A   B     C   D
    2018-03-10   0   1     2   3
    2018-03-11   4   5     6   7
    2018-03-12   8   9  1111  11
    2018-03-13  12  13    14  15
    2018-03-14  16  17    18  19
    2018-03-15  20  21    22  23
    '''
    
    df.iloc[2,2] = 999#    
    df.loc['2018-03-13', 'D'] = 999
    print(df)
    
                 A   B   C   D
    2018-03-10   0   1   2   3
    2018-03-11   4   5   6   7
    2018-03-12   8   9  10  11
    2018-03-13  12  13  14  15
    2018-03-14  16  17  18  19
    2018-03-15  20  21  22  23
                 A   B    C    D
    2018-03-10   0   1    2    3
    2018-03-11   4   5    6    7
    2018-03-12   8   9  999   11
    2018-03-13  12  13   14  999
    2018-03-14  16  17   18   19
    2018-03-15  20  21   22   23
    
    df[df.A>10]=999# df.A  10    
    print(df)
    
                  A    B    C    D
    2018-03-10    0    1    2    3
    2018-03-11    4    5    6    7
    2018-03-12    8    9  999   11
    2018-03-13  999  999  999  999
    2018-03-14  999  999  999  999
    2018-03-15  999  999  999  999
    
    df['F']=np.nan
    print(df)
    
                  A    B    C    D   F
    2018-03-10    0    1    2    3 NaN
    2018-03-11    4    5    6    7 NaN
    2018-03-12    8    9  999   11 NaN
    2018-03-13  999  999  999  999 NaN
    2018-03-14  999  999  999  999 NaN
    2018-03-15  999  999  999  999 NaN
    
    df['E']  = pd.Series([1,2,3,4,5,6], index=pd.date_range('20180310', periods=6))#    
    print(df)
    
                 A   B    C    D  E
    2018-03-10   0   1    2    3  1
    2018-03-11   4   5    6    7  2
    2018-03-12   8   9  999   11  3
    2018-03-13  12  13   14  999  4
    2018-03-14  16  17   18   19  5
    2018-03-15  20  21   22   23  6
    

    Pandasは損失データを処理する
  • 処理データ中のNaNデータ
  • dropna()関数を使用してNaNの行または列を削除する
  • fillna()関数を使用してNaN値
  • を置換
  • isnull()関数を用いてデータが失われたか否かを判断する
  • .
    処理データ中のNaNデータ
    dates = pd.date_range('20180310', periods=6)
    df = pd.DataFrame(np.arange(24).reshape((6,4)), index=dates, columns=['A', 'B', 'C', 'D'])
    df.iloc[0,1]=np.nan
    df.iloc[1]=np.nan
    print(df)
    
                   A     B     C     D
    2018-03-10   0.0   NaN   2.0   3.0
    2018-03-11   NaN   NaN   NaN   NaN
    2018-03-12   8.0   9.0  10.0  11.0
    2018-03-13  12.0  13.0  14.0  15.0
    2018-03-14  16.0  17.0  18.0  19.0
    2018-03-15  20.0  21.0  22.0  23.0
    

    dropna()関数を使用してNaNの行または列を削除
    #0       1       
    #any:    NaN  drop  
    #all:     NaN  drop
    print(df.dropna(axis=0,how='any'))
    print(df.dropna(axis=0,how='all'))
    
                       A         B         C         D
    2018-03-10  0.872767  2.188739  0.766781 -0.001429
    2018-03-11  0.218740 -0.556263 -0.047700  0.470347
    2018-03-12 -0.816785  0.479690  1.722349  1.116260
    2018-03-13  0.988138 -0.025760 -0.971384 -0.558211
    2018-03-14 -0.581776  1.021027 -1.280569  1.022587
    2018-03-15  0.061455 -1.647589 -1.568288 -0.467407
                       A         B         C         D
    2018-03-10  0.872767  2.188739  0.766781 -0.001429
    2018-03-11  0.218740 -0.556263 -0.047700  0.470347
    2018-03-12 -0.816785  0.479690  1.722349  1.116260
    2018-03-13  0.988138 -0.025760 -0.971384 -0.558211
    2018-03-14 -0.581776  1.021027 -1.280569  1.022587
    2018-03-15  0.061455 -1.647589 -1.568288 -0.467407
    

    Fillna()関数を使用してNaN値を置換
    print(df.fillna(value=233))# NaN    0
    
                    A      B      C      D
    2018-03-10    0.0  233.0    2.0    3.0
    2018-03-11  233.0  233.0  233.0  233.0
    2018-03-12    8.0    9.0   10.0   11.0
    2018-03-13   12.0   13.0   14.0   15.0
    2018-03-14   16.0   17.0   18.0   19.0
    2018-03-15   20.0   21.0   22.0   23.0
    

    isnull()関数を使用して、データが失われたかどうかを判断します.
    print(pd.isnull(df))#            nan ture   nan false
    print("----------------
    ----------------"
    ) print(np.any(df.isnull()))# NaN #True
                    A      B      C      D
    2018-03-10  False   True  False  False
    2018-03-11   True   True   True   True
    2018-03-12  False  False  False  False
    2018-03-13  False  False  False  False
    2018-03-14  False  False  False  False
    2018-03-15  False  False  False  False
    ----------------
    ----------------
    True
    

    Pandasインポートエクスポート
    data=pd.read_csv('test1.csv')#  csv  
    data.to_pickle('test2.pickle')#      pickle   
    #            
    

    Pandas連結データ
  • axisマージ方向
  • joinマージ方式
  • append追加データ
  • axisマージ方向
    df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'])
    df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d'])
    df3 = pd.DataFrame(np.ones((3,4))*2, columns=['a','b','c','d'])
    res = pd.concat([df1, df2, df3], axis=0, ignore_index=True)
    #0       1       ingnore_index    index index  0 1 2 3 4 5 6 7 8
    print(res)
    
         a    b    c    d
    0  0.0  0.0  0.0  0.0
    1  0.0  0.0  0.0  0.0
    2  0.0  0.0  0.0  0.0
    3  1.0  1.0  1.0  1.0
    4  1.0  1.0  1.0  1.0
    5  1.0  1.0  1.0  1.0
    6  2.0  2.0  2.0  2.0
    7  2.0  2.0  2.0  2.0
    8  2.0  2.0  2.0  2.0
    

    joinマージ方式
    df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'], index=[1,2,3])
    df2 = pd.DataFrame(np.ones((3,4))*1, columns=['b','c','d', 'e'], index=[2,3,4])
    print(df1)
    print(df2)
    print("----------------
    ----------------"
    ) # ,join='outer' res=pd.concat([df1,df2],axis=1,join='outer') print(res) print("----------------
    ----------------"
    ) # ,join='inner' res=pd.concat([df1,df2],axis=1,join='inner') print(res) print("----------------
    ----------------"
    ) # df1 df2 NaN res=pd.concat([df1,df2],axis=1,join_axes=[df1.index]) print(res)
         a    b    c    d
    1  0.0  0.0  0.0  0.0
    2  0.0  0.0  0.0  0.0
    3  0.0  0.0  0.0  0.0
         b    c    d    e
    2  1.0  1.0  1.0  1.0
    3  1.0  1.0  1.0  1.0
    4  1.0  1.0  1.0  1.0
    ----------------
    ----------------
         a    b    c    d    b    c    d    e
    1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
    2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    4  NaN  NaN  NaN  NaN  1.0  1.0  1.0  1.0
    ----------------
    ----------------
         a    b    c    d    b    c    d    e
    2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    ----------------
    ----------------
         a    b    c    d    b    c    d    e
    1  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN
    2  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    3  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0
    

    append追加データ
    df1 = pd.DataFrame(np.ones((3,4))*0, columns=['a','b','c','d'])
    df2 = pd.DataFrame(np.ones((3,4))*1, columns=['a','b','c','d'])
    s1 = pd.Series([1,2,3,4], index=['a','b','c','d'])
    print(s1)
    print("----------------
    ----------------"
    ) # df2 df1 index res=df1.append(df2,ignore_index=True) print(res) print("----------------
    ----------------"
    ) # s1 df1 index res=df1.append(s1,ignore_index=True) print(res)
    a    1
    b    2
    c    3
    d    4
    dtype: int64
    ----------------
    ----------------
         a    b    c    d
    0  0.0  0.0  0.0  0.0
    1  0.0  0.0  0.0  0.0
    2  0.0  0.0  0.0  0.0
    3  1.0  1.0  1.0  1.0
    4  1.0  1.0  1.0  1.0
    5  1.0  1.0  1.0  1.0
    ----------------
    ----------------
         a    b    c    d
    0  0.0  0.0  0.0  0.0
    1  0.0  0.0  0.0  0.0
    2  0.0  0.0  0.0  0.0
    3  1.0  2.0  3.0  4.0
    

    Pandasマージ
  • 一組のkeyに従って
  • を合併する
  • は2組のkeyに基づいて
  • を合併する.
  • Indicator合併
  • indexによる
  • のマージ
    一連のkeyに基づいてマージ
    left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                         'A': ['A0', 'A1', 'A2', 'A3'],
                         'B': ['B0', 'B1', 'B2', 'B3']})
    print(left)
    print("----------------
    ----------------"
    ) right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) print(right) print("----------------
    ----------------"
    ) res=pd.merge(left,right,on='key') print(res)
      key   A   B
    0  K0  A0  B0
    1  K1  A1  B1
    2  K2  A2  B2
    3  K3  A3  B3
    ----------------
    ----------------
      key   C   D
    0  K0  C0  D0
    1  K1  C1  D1
    2  K2  C2  D2
    3  K3  C3  D3
    ----------------
    ----------------
      key   A   B   C   D
    0  K0  A0  B0  C0  D0
    1  K1  A1  B1  C1  D1
    2  K2  A2  B2  C2  D2
    3  K3  A3  B3  C3  D3
    

    2組のkeyによるマージ
    left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                                 'key2': ['K0', 'K1', 'K0', 'K1'],
                                 'A': ['A0', 'A1', 'A2', 'A3'],
                                 'B': ['B0', 'B1', 'B2', 'B3']})
    print(left)
    print("----------------
    ----------------"
    ) right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], 'key2': ['K0', 'K0', 'K0', 'K0'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) print(right) print("----------------
    ----------------"
    ) # res=pd.merge(left,right,on=['key1','key2'],how='inner') print(res) print("----------------
    ----------------"
    ) # res=pd.merge(left,right,on=['key1','key2'],how='outer') print(res) print("----------------
    ----------------"
    ) # res=pd.merge(left,right,on=['key1','key2'],how='left') print(res) print("----------------
    ----------------"
    ) # res=pd.merge(left,right,on=['key1','key2'],how='right') print(res)
      key1 key2   A   B
    0   K0   K0  A0  B0
    1   K0   K1  A1  B1
    2   K1   K0  A2  B2
    3   K2   K1  A3  B3
    ----------------
    ----------------
      key1 key2   C   D
    0   K0   K0  C0  D0
    1   K1   K0  C1  D1
    2   K1   K0  C2  D2
    3   K2   K0  C3  D3
    ----------------
    ----------------
      key1 key2   A   B   C   D
    0   K0   K0  A0  B0  C0  D0
    1   K1   K0  A2  B2  C1  D1
    2   K1   K0  A2  B2  C2  D2
    ----------------
    ----------------
      key1 key2    A    B    C    D
    0   K0   K0   A0   B0   C0   D0
    1   K0   K1   A1   B1  NaN  NaN
    2   K1   K0   A2   B2   C1   D1
    3   K1   K0   A2   B2   C2   D2
    4   K2   K1   A3   B3  NaN  NaN
    5   K2   K0  NaN  NaN   C3   D3
    ----------------
    ----------------
      key1 key2   A   B    C    D
    0   K0   K0  A0  B0   C0   D0
    1   K0   K1  A1  B1  NaN  NaN
    2   K1   K0  A2  B2   C1   D1
    3   K1   K0  A2  B2   C2   D2
    4   K2   K1  A3  B3  NaN  NaN
    ----------------
    ----------------
      key1 key2    A    B   C   D
    0   K0   K0   A0   B0  C0  D0
    1   K1   K0   A2   B2  C1  D1
    2   K1   K0   A2   B2  C2  D2
    3   K2   K0  NaN  NaN  C3  D3
    

    Indicatorマージ
    df1 = pd.DataFrame({'col1':[0,1], 'col_left':['a','b']})
    print(df1)
    
    df2 = pd.DataFrame({'col1':[1,2,2],'col_right':[2,2,2]})
    print(df2)
    print("----------------
    ----------------"
    ) # col1 indicator=True res=pd.merge(df1,df2,on='col1',how='outer',indicator=True) print(res) print("----------------
    ----------------"
    ) # indicator column res = pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column') print(res)
       col1 col_left
    0     0        a
    1     1        b
       col1  col_right
    0     1          2
    1     2          2
    2     2          2
    ----------------
    ----------------
       col1 col_left  col_right      _merge
    0     0        a        NaN   left_only
    1     1        b        2.0        both
    2     2      NaN        2.0  right_only
    3     2      NaN        2.0  right_only
    ----------------
    ----------------
       col1 col_left  col_right indicator_column
    0     0        a        NaN        left_only
    1     1        b        2.0             both
    2     2      NaN        2.0       right_only
    3     2      NaN        2.0       right_only
    

    indexによるマージ
    left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                                      'B': ['B0', 'B1', 'B2']},
                                      index=['K0', 'K1', 'K2'])
    print(left)
    
    right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                                         'D': ['D0', 'D2', 'D3']},
                                          index=['K0', 'K2', 'K3'])
    print(right)
    print("----------------
    ----------------"
    ) # index res=pd.merge(left,right,left_index=True,right_index=True,how='outer') print(res) print("----------------
    ----------------"
    ) res=pd.merge(left,right,left_index=True,right_index=True,how='inner') print(res)
         A   B
    K0  A0  B0
    K1  A1  B1
    K2  A2  B2
         C   D
    K0  C0  D0
    K2  C2  D2
    K3  C3  D3
    ----------------
    ----------------
          A    B    C    D
    K0   A0   B0   C0   D0
    K1   A1   B1  NaN  NaN
    K2   A2   B2   C2   D2
    K3  NaN  NaN   C3   D3
    ----------------
    ----------------
         A   B   C   D
    K0  A0  B0  C0  D0
    K2  A2  B2  C2  D2