時系列データ分析--Time Series--時間データ再サンプリング

4049 ワード

python

リサンプリング

時間系列をある周波数から別の周波数に変換するプロセスは、集約表示

を使用する必要がある.

pandasにおけるresample法による再サンプリング

Resampleオブジェクト

を生成する

高周波->低周波

resample(freq).sum(),resample(freq).mean()............

#resample

import pandas as pd
import numpy as np

date_rng = pd.date_range('20170101', periods=100, freq='D')
ser_obj = pd.Series(range(len(date_rng)), index=date_rng)
print(ser_obj.head(10))
answer：
2017-01-01    0
2017-01-02    1
2017-01-03    2
2017-01-04    3
2017-01-05    4
2017-01-06    5
2017-01-07    6
2017-01-08    7
2017-01-09    8
2017-01-10    9
Freq: D, dtype: int64

#           
resample_month_sum = ser_obj.resample('M').sum()
#           
resample_month_mean = ser_obj.resample('M').mean()

print('    ：', resample_month_sum)
print('     ：', resample_month_mean)
answer：
    ： 2017-01-31     465
2017-02-28    1246
2017-03-31    2294
2017-04-30     945
Freq: M, dtype: int64
     ： 2017-01-31    15.0
2017-02-28    44.5
2017-03-31    74.0
2017-04-30    94.5
Freq: M, dtype: float64

ダウンサンプリング

データを正規の低周波数

に集約する.

OHLC再サンプリング,open,high,low,close

groupbyダウンサンプリング

を使用

#       5    
five_day_sum_sample = ser_obj.resample('5D').sum()
five_day_mean_sample = ser_obj.resample('5D').mean()
five_day_ohlc_sample = ser_obj.resample('5D').ohlc()

print('   ，sum..')
print(five_day_sum_sample.head())
   ，sum..
2017-01-01     10
2017-01-06     35
2017-01-11     60
2017-01-16     85
2017-01-21    110
dtype: int64

print('   ，ohlc')
print(five_day_ohlc_sample.head())
   ，ohlc
            open  high  low  close
2017-01-01     0     4    0      4
2017-01-06     5     9    5      9
2017-01-11    10    14   10     14
2017-01-16    15    19   15     19
2017-01-21    20    24   20     24

#   groupby   
print(ser_obj.groupby(lambda x: x.month).sum())
answer
1     465
2    1246
3    2294
4     945
dtype: int32

print(ser_obj.groupby(lambda x: x.weekday).sum())
answer
0    750
1    665
2    679
3    693
4    707
5    721
6    735
dtype: int32

3.リフトサンプリング

データは低周波から高周波まで補間する必要があり、そうでなければNaN

である.

一般的な補間方法

ffill(limit)、空の値は前のlimit個の値を取って

を充填する.

bfill(limit)

fillna('ffill‘)/ 'bfill'

interpolate補間アルゴリズム

#   
df = pd.DataFrame(np.random.randn(5, 3),
                 index=pd.date_range('20170101', periods=5, freq='W-MON'),
                 columns=['S1', 'S2', 'S3'])
print(df)
answer
  S1        S2        S3
2017-01-02  0.087264 -0.047404 -0.754223
2017-01-09  1.148830  2.439266 -0.889873
2017-01-16  0.331767  0.918984  1.164783
2017-01-23 -0.582157  0.923737  1.938061
2017-01-30 -0.637087  0.143846 -1.500307

#           
print(df.resample('D').asfreq().head(10))
answer
S1        S2        S3
2017-01-02  0.003409 -0.939362  2.036451
2017-01-03       NaN       NaN       NaN
2017-01-04       NaN       NaN       NaN
2017-01-05       NaN       NaN       NaN
2017-01-06       NaN       NaN       NaN
2017-01-07       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN
2017-01-09  0.291274 -0.655332 -1.034041
2017-01-10       NaN       NaN       NaN
2017-01-11       NaN       NaN       NaN

#ffill
print(df.resample('D').ffill(2).head())
answer
S1        S2        S3
2017-01-02  0.003409 -0.939362  2.036451
2017-01-03  0.003409 -0.939362  2.036451
2017-01-04  0.003409 -0.939362  2.036451
2017-01-05       NaN       NaN       NaN
2017-01-06       NaN       NaN       NaN
2017-01-07       NaN       NaN       NaN
2017-01-08       NaN       NaN       NaN
2017-01-09  0.291274 -0.655332 -1.034041
2017-01-10  0.291274 -0.655332 -1.034041
2017-01-11  0.291274 -0.655332 -1.034041

print(df.resample('D').bfill())
print(df.resample('D').fillna('ffill'))
print(df.resample('D').interpolate('linear'))

leetcode-350. 2つの配列の交差II-C言語

railsのデータベースをmysqlにする