Python学習ノートのPandas基本操作(表)と図面

2841 ワード

pipを使用してPandasをインストールします.

pip install pandas

Pandasパッケージをインポートし、別名pdを付けます.

import Pandas as pd

DataFrame型データの作成:

scores = {
	'name' : ['johe', 'mike', 'tom', 'jeck'],
	'sex' : ['male', 'female', 'male', 'male'],
	'chinese' : [88, 83, 90, 78],
	'math' : [96, 88, 86, 90],
	'english' : [85, 80, 90, 75]
}
df = pd.DataFrame(scores)    #     index=[...]

series:インデックス付きクラス配列のデータ構造で、DataFrameの列として使用可能

df['computer'] = pd.Series([65, 70, 85, 90])

csvファイルを読み込みます(表ファイル、一定のフォーマットがあり、xlsxファイルでも構いません):

df = pd.read_csv('./form_test.csv')    #   csv    ，       DataFrame

操作データFrameデータ:

df.index    #     
df.columns    #     
df.head(i)    #          i ，  5 
df.tail(i)    #           i 

df.loc[i]    #    ：    ，           
df.iloc[i]    #    ：    
df.ix[i]    #   loc iloc  
df.ix[:i]    #       i ，           

df[df.math>90]    #          90   
df[(df.english>80)&(df.math>90)]    #     ：      80       90

df.drop(['english'], axis=1)    #      
df['english'] = [85, 80, 90, 75]    #      

df.sort_values(['math'])    #   ：       ，            
df.values    #        
df.math.values    #          
df.T    #

数学の統計:

df.describe()    #   （  、  、   、    ）
df.mean()    #   
df.var()    #   
df.sex.count_values()    #     （  ）
df.groupby('chinese').sum()    #

欠落した値の処理:

df.dropna()    #     
df.fillna(value=0)    #     ，  value

関数の使用(ポイント):

def f(score):
	if score >= 90:
		return '  '
	elif score >= 60:
		return '  '
	else:
		return '   '
		
df['    '] = df['math'].map(f)     #               

df['  '] = df.apply(lambda x: x.chinese+x.math+x.english, axis=1)    #               

df.applymap(lambda x: str(x)+' * ')    #

データフレームデータをマトリクスに変換

df.as_matrix()

Pandasは図面を持っています.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10, 4).cumsum(0), columns=['A', 'B', 'C', 'D'])    #         
df.plot()    # pandas     
plt.show()    #   matplotlib    

df = pd.DataFrame(np.random.randint(10, 50, (3, 4)), columns=['A', 'B', 'C', 'D'])    #         
df.plot.bar()    # pandas     
plt.show()

df = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])    #         
df.hist()    # pandas     
plt.show()

Firefoxでの閲覧履歴をCSV形式でエクスポート

Firefoxでiframe使うときに、TypeErrorに遭遇した場合