Pythonデータ分析実戦【第3章】2.13-時系列-再サンプリング【python】


【レッスン2.13】時間系列-再サンプリング
時間系列を1つの周波数から別の周波数に変換するプロセスで、データの結合があります.
ダウンサンプリング:高周波データ→低周波データ、eg.日を周波数とするデータから月を周波数とするデータへアップサンプリング:低周波データ→高周波データ、eg.年を周波数とするデータから月を周波数とするデータへ
1.再サンプリング:.resample()

#           TimeSeries,     2    

rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(12), index = rng)
print(ts)

ts_re = ts.resample('5D')
ts_re2 = ts.resample('5D').sum()
print(ts_re, type(ts_re))
print(ts_re2, type(ts_re2))
print('-----')
# ts.resample('5D'):          ,    5 
# ts.resample('5D').sum():          Series,       
# freq:      → ts.resample('5D')
# .sum()print(ts.resample('5D').mean(),'→     
'
) print(ts.resample('5D').max(),'→
'
) print(ts.resample('5D').min(),'→
'
) print(ts.resample('5D').median(),'→
'
) print(ts.resample('5D').first(),'→
'
) print(ts.resample('5D').last(),'→
'
) print(ts.resample('5D').ohlc(),'→ OHLC
'
) # OHLC: → open 、high 、low 、close ----------------------------------------------------------------------- 2017-01-01 0 2017-01-02 1 2017-01-03 2 2017-01-04 3 2017-01-05 4 2017-01-06 5 2017-01-07 6 2017-01-08 7 2017-01-09 8 2017-01-10 9 2017-01-11 10 2017-01-12 11 Freq: D, dtype: int32 DatetimeIndexResampler [freq=<5 * Days>, axis=0, closed=left, label=left, convention=start, base=0] <class 'pandas.tseries.resample.DatetimeIndexResampler'> 2017-01-01 10 2017-01-06 35 2017-01-11 21 Freq: 5D, dtype: int32 <class 'pandas.core.series.Series'> ----- 2017-01-01 2.0 2017-01-06 7.0 2017-01-11 10.5 Freq: 5D, dtype: float64 → 2017-01-01 4 2017-01-06 9 2017-01-11 11 Freq: 5D, dtype: int32 → 2017-01-01 0 2017-01-06 5 2017-01-11 10 Freq: 5D, dtype: int32 → 2017-01-01 2.0 2017-01-06 7.0 2017-01-11 10.5 Freq: 5D, dtype: float64 → 2017-01-01 0 2017-01-06 5 2017-01-11 10 Freq: 5D, dtype: int32 → 2017-01-01 4 2017-01-06 9 2017-01-11 11 Freq: 5D, dtype: int32 → open high low close 2017-01-01 0 4 0 4 2017-01-06 5 9 5 9 2017-01-11 10 11 10 11 → OHLC

2.ダウンサンプリング


rng = pd.date_range('20170101', periods = 12)
ts = pd.Series(np.arange(1,13), index = rng)
print(ts)

print(ts.resample('5D').sum(),'→   
'
) print(ts.resample('5D', closed = 'left').sum(),'→ left
'
) print(ts.resample('5D', closed = 'right').sum(),'→ right
'
) print('-----') # closed: ( ) , # : values 0-115D → [1,2,3,4,5],[6,7,8,9,10],[11,12] # left → [1,2,3,4,5],[6,7,8,9,10],[11,12] # right → [1],[2,3,4,5,6],[7,8,9,10,11],[12] print(ts.resample('5D', label = 'left').sum(),'→ leftlabel
'
) print(ts.resample('5D', label = 'right').sum(),'→ rightlabel
'
) # label: index, # ( closed ) ----------------------------------------------------------------------- 2017-01-01 1 2017-01-02 2 2017-01-03 3 2017-01-04 4 2017-01-05 5 2017-01-06 6 2017-01-07 7 2017-01-08 8 2017-01-09 9 2017-01-10 10 2017-01-11 11 2017-01-12 12 Freq: D, dtype: int32 2017-01-01 15 2017-01-06 40 2017-01-11 23 Freq: 5D, dtype: int32 → 2017-01-01 15 2017-01-06 40 2017-01-11 23 Freq: 5D, dtype: int32 → left 2016-12-27 1 2017-01-01 20 2017-01-06 45 2017-01-11 12 Freq: 5D, dtype: int32 → right ----- 2017-01-01 15 2017-01-06 40 2017-01-11 23 Freq: 5D, dtype: int32 → leftlabel 2017-01-06 15 2017-01-11 40 2017-01-16 23 Freq: 5D, dtype: int32 → rightlabel

3.昇サンプリング及び補間
rng = pd.date_range('2017/1/1 0:0:0', periods = 5, freq = 'H')
ts = pd.DataFrame(np.arange(15).reshape(5,3),
                  index = rng,
                  columns = ['a','b','c'])
print(ts)

print(ts.resample('15T').asfreq())
print(ts.resample('15T').ffill())
print(ts.resample('15T').bfill())
#      ,       
# .asfreq():    ,  Nan
# .ffill():    
# .bfill()-----------------------------------------------------------------------
                      a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 01:00:00   3   4   5
2017-01-01 02:00:00   6   7   8
2017-01-01 03:00:00   9  10  11
2017-01-01 04:00:00  12  13  14
                        a     b     c
2017-01-01 00:00:00   0.0   1.0   2.0
2017-01-01 00:15:00   NaN   NaN   NaN
2017-01-01 00:30:00   NaN   NaN   NaN
2017-01-01 00:45:00   NaN   NaN   NaN
2017-01-01 01:00:00   3.0   4.0   5.0
2017-01-01 01:15:00   NaN   NaN   NaN
2017-01-01 01:30:00   NaN   NaN   NaN
2017-01-01 01:45:00   NaN   NaN   NaN
2017-01-01 02:00:00   6.0   7.0   8.0
2017-01-01 02:15:00   NaN   NaN   NaN
2017-01-01 02:30:00   NaN   NaN   NaN
2017-01-01 02:45:00   NaN   NaN   NaN
2017-01-01 03:00:00   9.0  10.0  11.0
2017-01-01 03:15:00   NaN   NaN   NaN
2017-01-01 03:30:00   NaN   NaN   NaN
2017-01-01 03:45:00   NaN   NaN   NaN
2017-01-01 04:00:00  12.0  13.0  14.0
                      a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 00:15:00   0   1   2
2017-01-01 00:30:00   0   1   2
2017-01-01 00:45:00   0   1   2
2017-01-01 01:00:00   3   4   5
2017-01-01 01:15:00   3   4   5
2017-01-01 01:30:00   3   4   5
2017-01-01 01:45:00   3   4   5
2017-01-01 02:00:00   6   7   8
2017-01-01 02:15:00   6   7   8
2017-01-01 02:30:00   6   7   8
2017-01-01 02:45:00   6   7   8
2017-01-01 03:00:00   9  10  11
2017-01-01 03:15:00   9  10  11
2017-01-01 03:30:00   9  10  11
2017-01-01 03:45:00   9  10  11
2017-01-01 04:00:00  12  13  14
                      a   b   c
2017-01-01 00:00:00   0   1   2
2017-01-01 00:15:00   3   4   5
2017-01-01 00:30:00   3   4   5
2017-01-01 00:45:00   3   4   5
2017-01-01 01:00:00   3   4   5
2017-01-01 01:15:00   6   7   8
2017-01-01 01:30:00   6   7   8
2017-01-01 01:45:00   6   7   8
2017-01-01 02:00:00   6   7   8
2017-01-01 02:15:00   9  10  11
2017-01-01 02:30:00   9  10  11
2017-01-01 02:45:00   9  10  11
2017-01-01 03:00:00   9  10  11
2017-01-01 03:15:00  12  13  14
2017-01-01 03:30:00  12  13  14
2017-01-01 03:45:00  12  13  14
2017-01-01 04:00:00  12  13  14

4.時期再サンプリング-Period


prng = pd.period_range('2016','2017',freq = 'M')
ts = pd.Series(np.arange(len(prng)), index = prng)
print(ts)

print(ts.resample('3M').sum())  #    
print(ts.resample('15D').ffill())  #    
-----------------------------------------------------------------------
2016-01     0
2016-02     1
2016-03     2
2016-04     3
2016-05     4
2016-06     5
2016-07     6
2016-08     7
2016-09     8
2016-10     9
2016-11    10
2016-12    11
2017-01    12
Freq: M, dtype: int32
2016-01-31     0
2016-04-30     6
2016-07-31    15
2016-10-31    24
2017-01-31    33
Freq: 3M, dtype: int32
2016-01-01     0
2016-01-16     0
2016-01-31     0
2016-02-15     1
2016-03-01     2
2016-03-16     2
2016-03-31     2
2016-04-15     3
2016-04-30     3
2016-05-15     4
2016-05-30     4
2016-06-14     5
2016-06-29     5
2016-07-14     6
2016-07-29     6
2016-08-13     7
2016-08-28     7
2016-09-12     8
2016-09-27     8
2016-10-12     9
2016-10-27     9
2016-11-11    10
2016-11-26    10
2016-12-11    11
2016-12-26    11
Freq: 15D, dtype: int32