Pandasの概要とSeriesの基礎応用

68939 ワード

python

一、Pandasのプロフィール

二、Pandasのデータ構造

2.1 Series概要

2.2 Seriesの作成

2.3 Seriesインデックス

2.4 Seriesの添削調査

2.4.1

増加

2.4.2削除

2.4.3改

2.4.4

を調べる

2.5 Series統計計算

一、Pandasの紹介
Pandas(Panel data&Python data analysis)は、Numpyに基づいて構築された強力なPythonデータ分析パッケージです.Pandasは、データの統計分析を迅速に行うことができ、欠落したデータをよりよく処理することができ、csv、excel、txtなどに関するデータ処理を柔軟に行うことができるほか、時間系列の特定機能もあり、Excelよりデータを処理するのに便利で、できることが多い.
pandas学習ルート:【pandas公式ドキュメントリンク】は、学ぶ前に【Numpy】を学ぶことをお勧めします.
pandasライブラリのインストール方法

pip install pandas

二、Pandasのデータ構造
Pandasでよく使われるデータ構造には、SeriesとDataFrameの2種類があります.これらのデータ構造はNumpyの2次元配列に基づいて構築されているため,実行効率が高い.私自身の理解はSeriesが単列配列であり、つまり1列のデータしかないということです.DataFrameは2次元配列で、Excelテーブルのように、複数行の複数列で構成されています.Excelとは異なり、1つの行列インデックスが複数あることです.インデックスがあれば、データ処理と分析で使いやすく、柔軟になります.
2.1 Seriesの概要
Seriesは名前とインデックスを持つ1次元配列オブジェクトであり、Seriesに含まれるデータ型は整数、浮動小数点、文字列、list、ndarrayなどであってもよい.
pandasを使用したSeriesインスタンスの作成

#   pandas 
import pandas as pd
data = [1,2]
pd.Series(data = data,index=None, dtype=None, name=None, copy=False, fastpath=False)

0    1
1    2
dtype: int64

パラメータ解析:
番号付け
パラメータ
説明
デフォルト
1
data(必須)
リストなどのSeriesに格納されているデータ
data=None
2
index(オプション)
類似配列またはインデックスはdataと同じ長さです.ユニークでないインデックス値を許可します.指定されていない場合は、RangeIndex(0,1,2,.,n)がデフォルトです.dictとindexシーケンスを同時に使用すると、インデックスはdictで見つかったキーを上書きします.
index=None
3
dtype(オプション)
データ型に使用され、ない場合はデータ型が推定されます.
dtype=None
4
name(オプション)
Seriesの名前
name=None
5
copy(オプション)
入力データのコピー
copy=False
6
fastpath(オプション)
クイックパス
fastpath=False
2.2 Seriesの作成
リストまたはNumpy配列の作成

"""     """
import numpy as np
import pandas as pd
lst = ["a","b","c"]
ndarry = np.arange(3)
print(lis,'\t\t',ndarry)
ds1 = pd.Series(lst) 
ds2 = pd.Series(ndarry)
print(ds1,'
',ds2)

[0, 1, 2] 		 [0 1 2]
0    a
1    b
2    c
dtype: object 
 0    0
1    1
2    2
dtype: int32

メタグループの作成

#   pandas   ,,np.nan   
tup = (1,np.nan,1)
s = pd.Series(tup)
print(s)

0    1.0
1    NaN
2    1.0
dtype: float64

辞書の作成

dic = {"a":[1,2],"b":2,"c":3} 
pd.Series(dic) #   key

a    [1, 2]
b         2
c         3
dtype: object

コレクションの作成

#       ，     ，        
s = set(range(3))
pd.Series(s)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

 in ()
      1 #       ，     ，        
      2 s = set(range(3))
----> 3 pd.Series(s)


~\Anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    272                 pass
    273             elif isinstance(data, (set, frozenset)):
--> 274                 raise TypeError(f"'{type(data).__name__}' type is unordered")
    275             elif isinstance(data, ABCSparseArray):
    276                 # handle sparse passed here (and force conversion)


TypeError: 'set' type is unordered

スカラーの作成

#       ，          
cc = pd.Series(5,index=["a","b"],name="aa") 
cc

a    5
b    5
Name: aa, dtype: int64

2.3 Seriesインデックス
索引の設定

"""      1"""
tup = (1,np.nan,1)
s = pd.Series(tup,index=["a","b","c"],name="cc")
s

a    1.0
b    NaN
c    1.0
Name: cc, dtype: float64

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s #

abc
a      1
b    nan
c      1
Name: cc, dtype: object

"""      3"""
tup = (1,np.nan,1)
s = pd.Series(tup)
s.index=["a",'2','3']
s

a    1.0
2    NaN
3    1.0
dtype: float64

インデックスの名前の変更

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s.index.name = 'new'  #  index        
s

new
a      1
b    nan
c      1
Name: cc, dtype: object

索引の表示

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.index)
print("      ：",s.index.tolist())

Index(['a', 'b', 'c'], dtype='object', name='abc')
      ： ['a', 'b', 'c']

索引名の変更

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s.index.tolist())
s.rename(index={'a':'aa'},inplace=True)
print("   ：",s.index.tolist())

   ： ['a', 'b', 'c']
   ： ['aa', 'b', 'c']

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s.index.tolist())

print("   ：",s.index.tolist())

   ： ['a', 'b', 'c']



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

 in ()
      5 s = pd.Series(tup,index=index_name,name="cc",dtype="str")
      6 print("   ：",s.index.tolist())
----> 7 s.index(["1",'2','3'])
      8 print("   ：",s.index.tolist())


TypeError: 'Index' object is not callable

データの表示

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.values)
print("      ：",s.values.tolist())

['1' 'nan' '1']
      ： ['1', 'nan', '1']

Series名の表示

"""      2"""
#           
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print(s.name)

cc

2.4 Seriesの追加削除
2.4.1増加

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s1 = pd.Series(tup,index=index_name,name="cc",dtype="str")
s1

abc
a      1
b    nan
c      1
Name: cc, dtype: object

s1["d"] = 2 #              
s1

abc
a      1
b    nan
c      1
d      2
Name: cc, dtype: object

dic = {"a":[1,2],"b":2,"c":3} 
s2 = pd.Series(dic) #   key    
s2

a    [1, 2]
b         2
c         3
dtype: object

s1.append(s2) #       Series

a         1
b       nan
c         1
d         2
a    [1, 2]
b         2
c         3
dtype: object

2.4.2削除

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
display(s)

abc
a      1
b    nan
c      1
Name: cc, dtype: object

#   1 del  
del s["b"]
print(s)

abc
a    1
c    1
Name: cc, dtype: object

print("   ：",s)
#   2 drop  
a = s.drop("a") 
print("   ：",s)

   ： abc
a    1
c    1
Name: cc, dtype: object
   ： abc
a    1
c    1
Name: cc, dtype: object

#         s     ，    a   
print(a)

abc
c    1
Name: cc, dtype: object

#     a         ，        inplace，    
print("   ：",s)
aa = s.drop("a",inplace=True)
print("   ：",s)

   ： abc
a    1
c    1
Name: cc, dtype: object
   ： abc
c    1
Name: cc, dtype: object

"""  Drop      """
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s)
aa = s.drop(["a","b"],inplace=True)
print("   ：",s)

   ： abc
a      1
b    nan
c      1
Name: cc, dtype: object
   ： abc
c    1
Name: cc, dtype: object

2.4.3変更

#        ，         
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s)
s["a"] = 2
print("   ：",s)

   ： abc
a      1
b    nan
c      1
Name: cc, dtype: object
   ： abc
a      2
b    nan
c      1
Name: cc, dtype: object

#        ，         
import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s)
#                 
s.loc["a"] = 3
print("   ：",s)

   ： abc
a      1
b    nan
c      1
Name: cc, dtype: object
   ： abc
a      3
b    nan
c      1
Name: cc, dtype: object

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
print("   ：",s)
#    -       ，       。
s.iloc[2] = 3
print("   ：",s)

   ： abc
a      1
b    nan
c      1
Name: cc, dtype: object
   ： abc
a      1
b    nan
c      3
Name: cc, dtype: object

2.4.4検査
索引による請求値の検索

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s["a"]

'1'

インデックス値による複数値の検索

import pandas as pd
index_name = pd.Index(["a","b","c"],name="abc")
tup = (1,np.nan,1)
s = pd.Series(tup,index=index_name,name="cc",dtype="str")
s[["a","b"]]

abc
a      1
b    nan
Name: cc, dtype: object

ブール型インデックスによるフィルタ

import pandas as pd
index_name = pd.Index(["a","b","c","d"],name="num")
tup = (1,2,3,4)
s = pd.Series(tup,index=index_name,name="cc",dtype="float")
s[s>2]

num
c    3.0
d    4.0
Name: cc, dtype: float64

位置スライスとラベルスライスによるデータの照会

import pandas as pd
index_name = pd.Index(["a","b","c","d"],name="num")
tup = (1,2,3,4)
s = pd.Series(tup,index=index_name,name="cc",dtype="float")
s[:2] #

num
a    1.0
b    2.0
Name: cc, dtype: float64

s["a":"c"]

num
a    1.0
b    2.0
c    3.0
Name: cc, dtype: float64

s[[0,1]]

num
a    1.0
b    2.0
Name: cc, dtype: float64

純整数-位置に基づいたインデックスで、位置によって選択できます.

s.iloc[:2][:]

num
a    1.0
b    2.0
Name: cc, dtype: float64

ラベルまたはブール配列による行と列のセットへのアクセス

s.loc["c":]

num
c    3.0
d    4.0
Name: cc, dtype: float64

s.loc[["c","b"]]

num
c    3.0
b    2.0
Name: cc, dtype: float64

前後n行の表示

import pandas as pd
tup = (1,2,3,4,4,5,6,7,8,9)
s = pd.Series(tup)
print("   5 ：",s.head()) #   5 
print("   5 ：",s.tail()) #   5 
print("   2 ：",s.head(2)) #   2 
print("   2 ：",s.tail(2))  #   2

   5 ： 0    1
1    2
2    3
3    4
4    4
dtype: int64
   5 ： 5    5
6    6
7    7
8    8
9    9
dtype: int64
   2 ： 0    1
1    2
dtype: int64
   2 ： 8    8
9    9
dtype: int64

2.5 Series統計計算
単一Seriesの計算

import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5])

s1 * 2 #       2 ，

0     2
1     4
2     6
3     8
4    10
dtype: int64

s1 +1 #       1

0    2
1    3
2    4
3    5
4    6
dtype: int64

2つのSeries間の演算(インデックスが同じ)

# +   
import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5])
s2 = pd.Series(tup[5:])
print("s1:",s1)
print("s2:",s2)

s1: 0    1
1    2
2    3
3    4
4    5
dtype: int64
s2: 0    5
1    6
2    7
3    8
4    9
dtype: int64

s1 + s2 #

0     6
1     8
2    10
3    12
4    14
dtype: int64

s2 - s1 #

0    4
1    4
2    4
3    4
4    4
dtype: int64

2つのSeries間の演算(インデックスが異なる)

# +   
import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s1 = pd.Series(tup[:5],index=["a","b",1,2,3])
s2 = pd.Series(tup[5:])
print("s1:",s1)
print("s2:",s2)

s1: a    1
b    2
1    3
2    4
3    5
dtype: int64
s2: 0    5
1    6
2    7
3    8
4    9
dtype: int64

s1 + s2 #         NaN

0     NaN
1     9.0
2    11.0
3    13.0
4     NaN
a     NaN
b     NaN
dtype: float64

s1 - s2 #         NaN

0    NaN
1   -3.0
2   -3.0
3   -3.0
4    NaN
a    NaN
b    NaN
dtype: float64

統計計算

import pandas as pd
tup = (1,2,3,4,5,5,6,7,8,9)
s = pd.Series(tup)
s.describe() #

count    10.000000
mean      5.000000
std       2.581989
min       1.000000
25%       3.250000
50%       5.000000
75%       6.750000
max       9.000000
dtype: float64

#     
s.mean()

5.0

#   
s.sum()

#    
s.std()

2.581988897471611

#    
s.max()

#    
s.min()

#    
print("     ：",s.quantile(0.25))
print("     ：",s.quantile(0.5))
print("     ：",s.quantile(0.75))

     ： 3.25
     ： 5.0
     ： 6.75

#    
s.cumsum()

0     1
1     3
2     6
3    10
4    15
5    20
6    26
7    33
8    41
9    50
dtype: int64

Djangoデータのインポートとエクスポート

Centos 7 Python 3をインストールする.6後yumが使えない解決方法