python-0-1マトリクス、prod()連乗関数、文字列、list変換

4516 ワード

(1)DataFrame形式のデータから得られる0-1マトリクスDataFrame形式のデータは以下の通りである.
data: 
    0  1    2    3
0  a  c    e  NaN
1  b  d  NaN  NaN
2  b  c  NaN  NaN
3  a  b    c    d
4  a  b  NaN  NaN
5  b  c  NaN  NaN
6  a  b  NaN  NaN
7  a  b    c    e
8  a  b    c  NaN
9  a  c    e  NaN

変換コード:
import pandas as pd

input_path = 'F:/DataMining/chapter5/menu_orders.xls'
data = pd.read_excel(input_path, header=None)

#     ct      x   NAN        ,        1。
# pd.Series()                ,                   
ct = lambda x: pd.Series(1, index=x[pd.notna(x)])
#               ,  map(ct, data.as_matrix())                     ct,
#           (             )       ct。
b = map(ct, data.as_matrix())
# python3 map()       ,    list(b)    
# print('list(b): 
', list(b)) , print('list(b):
', list(b)) # list(b) Series, Series , # DataFrame data_01matrix = pd.DataFrame(list(b)).fillna(0) # ,NAN 0 print('data_01matrix:
', data_01matrix) print(u'
。')

結果は次のとおりです.
list(b): 
 [a    1
c    1
e    1
dtype: int64, 
b    1
d    1
dtype: int64, 
b    1
c    1
dtype: int64, 
a    1
b    1
c    1
d    1
dtype: int64, 
a    1
b    1
dtype: int64, 
b    1
c    1
dtype: int64, 
a    1
b    1
dtype: int64, 
a    1
b    1
c    1
e    1
dtype: int64, 
a    1
b    1
c    1
dtype: int64, 
a    1
c    1
e    1
dtype: int64]

data_01matrix: 
      a    c    e    b    d
0  1.0  1.0  1.0  0.0  0.0
1  0.0  0.0  0.0  1.0  1.0
2  0.0  1.0  0.0  1.0  0.0
3  1.0  1.0  0.0  1.0  1.0
4  1.0  0.0  0.0  1.0  0.0
5  0.0  1.0  0.0  1.0  0.0
6  1.0  0.0  0.0  1.0  0.0
7  1.0  1.0  1.0  1.0  0.0
8  1.0  1.0  0.0  1.0  0.0
9  1.0  1.0  1.0  0.0  0.0

    。

(2)prod()連乗関数Seriesはnumeric_を実現できない.only. 例1:Seriesは文字を含まない
# prod()    
import pandas as pd

series1 = pd.Series([1, 2, 3, 4])
print('series: 
', series1) print('series_prod:
', series1.prod()) : series1: 0 1 1 2 2 3 3 4 dtype: int64 series1_prod: 24

例2:Seriesは文字を含む
series2 = pd.Series([1, 'a', 3, 4])
print('series: 
', series2) print('series_prod:
', series2.prod()) # series prod() : series2: 0 1 1 a 2 3 3 4 dtype: object series2_prod: aaaaaaaaaaaa

DataFrameは、numeric_を実装するために行または列で乗算することを選択できます.only.
例3:DataFrame文字なし
dataframe1 = pd.DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]])
print('dataframe1: 
', dataframe1) print('dataframe1_prod0:
', dataframe1.prod(axis=0)) print('dataframe1_prod1:
', dataframe1.prod(axis=1)) : dataframe1: 0 1 2 3 0 1 2 3 4 1 5 6 7 8 dataframe1_prod0: 0 5 1 12 2 21 3 32 dtype: int64 dataframe1_prod1: 0 24 1 1680 dtype: int64

例4:DataFrameは文字(numeric_only=True)を含む
dataframe2 = pd.DataFrame([[1, 'a', 's', 4], [5, 6, 7, 8]])
print('dataframe2: 
', dataframe2) # numeric_only=True , 1( axis=1 ) # axis=0 , # axis=1 1 print('dataframe2_prod01:
', dataframe2.prod(axis=0, numeric_only=True)) print('dataframe2_prod11:
', dataframe2.prod(axis=1, numeric_only=True)) : dataframe2: 0 1 2 3 0 1 a s 4 1 5 6 7 8 dataframe2_prod0: 0 5 3 32 dtype: int64 dataframe2_prod1: 0 4 1 40 dtype: int64

例5:DataFrame含字(numeric_only=False)
dataframe2 = pd.DataFrame([[1, 'a', 's', 4], [5, 6, 7, 8]])
print('dataframe2: 
', dataframe2) # numeric_only=False , print('dataframe2_prod02:
', dataframe2.prod(axis=0, numeric_only=False)) print('dataframe2_prod12:
', dataframe2.prod(axis=1, numeric_only=False)) : dataframe2: 0 1 2 3 0 1 a s 4 1 5 6 7 8 dataframe2_prod02: 0 5 1 aaaaaa 2 sssssss 3 32 dtype: object dataframe2_prod12: Error

(3)文字列とlist変換
# list    
ms = '---'
list1 = ['a', 'b', 'c']
s = ms.join(list1)
print('s: 
', s) : s: a---b---c # list list2 = s.split('---') print('list2:
', list2) : list2: ['a', 'b', 'c']