Pandas —— resample()重采样和asfreq()频度转换方式
resample()
resample()进行重采样。
重采样(Resampling)指的是把时间序列的频度变为另一个频度的过程。把高频度的数据变为低频度叫做降采样(downsampling),把低频度变为高频度叫做增采样(upsampling)。
降采样
考虑因素:
各区间哪边是闭合的(参数:closed)
如何标记各聚合面元,用区间的开头还是末尾(参数:label)
In[232]:ts_index=pd.date_range('2018-08-03',periods=12,freq='T') In[233]:ts=pd.Series(np.arange(12),index=ts_index) In[234]:ts Out[234]: 2018-08-0300:00:000 2018-08-0300:01:001 2018-08-0300:02:002 2018-08-0300:03:003 2018-08-0300:04:004 2018-08-0300:05:005 2018-08-0300:06:006 2018-08-0300:07:007 2018-08-0300:08:008 2018-08-0300:09:009 2018-08-0300:10:0010 2018-08-0300:11:0011 Freq:T,dtype:int32
默认使用左标签(label=‘left'),左闭合(closed='left')
此时第一个区间为:2018-08-0300:00:00~2018-08-0300:04:59,故sum为10,label为:2018-08-0300:00:00
In[235]:ts.resample('5min').sum() Out[235]: 2018-08-0300:00:0010 2018-08-0300:05:0035 2018-08-0300:10:0021 Freq:5T,dtype:int32
可以指定为右闭合(closed='right'),默认使用左标签(label=‘left')
此时第一个区间为:2018-08-0223:55:01~2018-08-0300:00:00,故sum为0,label为:2018-08-0223:55:00
In[236]:ts.resample('5min',closed='right').sum() Out[236]: 2018-08-0223:55:000 2018-08-0300:00:0015 2018-08-0300:05:0040 2018-08-0300:10:0011 Freq:5T,dtype:int32
可以指定为右闭合(closed='right'),右标签(label=‘right')
此时第一个区间为:2018-08-0223:55:01~2018-08-0300:00:00,故sum为0,label为:2018-08-0300:00:00
In[237]:ts.resample('5min',closed='right',label='right').sum() Out[237]: 2018-08-0300:00:000 2018-08-0300:05:0015 2018-08-0300:10:0040 2018-08-0300:15:0011 Freq:5T,dtype:int32
升采样
考虑因素:
没有聚合,但是需要填充
In[244]:frame=pd.DataFrame(np.random.randn(2,4), ...:index=pd.date_range('1/1/2000',periods=2, ...:freq='W-WED'),#freq='W-WED'表示按周 ...:columns=['Colorado','Texas','NewYork','Ohio']) In[245]:frame Out[245]: ColoradoTexasNewYorkOhio 2000-01-051.2017130.029819-1.366082-1.325252 2000-01-12-0.711291-1.0701331.4692720.809806
当我们对这个数据进行聚合的的时候,每个组只有一个值,以及gap(间隔)之间的缺失值。在不使用任何聚合函数的情况下,
我们使用asfreq方法将其转换为高频度:
In[246]:df_daily=frame.resample('D').asfreq() In[247]:df_daily Out[247]: ColoradoTexasNewYorkOhio 2000-01-051.2017130.029819-1.366082-1.325252 2000-01-06NaNNaNNaNNaN 2000-01-07NaNNaNNaNNaN 2000-01-08NaNNaNNaNNaN 2000-01-09NaNNaNNaNNaN 2000-01-10NaNNaNNaNNaN 2000-01-11NaNNaNNaNNaN 2000-01-12-0.711291-1.0701331.4692720.809806
使用ffill()进行填充
In[248]:frame.resample('D').ffill() Out[248]: ColoradoTexasNewYorkOhio 2000-01-051.2017130.029819-1.366082-1.325252 2000-01-061.2017130.029819-1.366082-1.325252 2000-01-071.2017130.029819-1.366082-1.325252 2000-01-081.2017130.029819-1.366082-1.325252 2000-01-091.2017130.029819-1.366082-1.325252 2000-01-101.2017130.029819-1.366082-1.325252 2000-01-111.2017130.029819-1.366082-1.325252 2000-01-12-0.711291-1.0701331.4692720.809806 In[249]:frame.resample('D').ffill(limit=2) Out[249]: ColoradoTexasNewYorkOhio 2000-01-051.2017130.029819-1.366082-1.325252 2000-01-061.2017130.029819-1.366082-1.325252 2000-01-071.2017130.029819-1.366082-1.325252 2000-01-08NaNNaNNaNNaN 2000-01-09NaNNaNNaNNaN 2000-01-10NaNNaNNaNNaN 2000-01-11NaNNaNNaNNaN 2000-01-12-0.711291-1.0701331.4692720.809806
新的日期索引没必要跟旧的重叠
In[250]:frame.resample('W-THU').ffill() Out[250]: ColoradoTexasNewYorkOhio 2000-01-061.2017130.029819-1.366082-1.325252 2000-01-13-0.711291-1.0701331.4692720.809806
分组重采样
In[279]:times=pd.date_range('2018-08-300:00',freq='1min',periods=10) In[280]:df2=pd.DataFrame({'time':times.repeat(3), ...:'key':np.tile(['a','b','c'],10), ...:'value':np.arange(30)}) In[281]:df2[:5] Out[281]: keytimevalue 0a2018-08-0300:00:000 1b2018-08-0300:00:001 2c2018-08-0300:00:002 3a2018-08-0300:01:003 4b2018-08-0300:01:004 In[282]:df2.groupby(['key',pd.Grouper(key='time',freq='5min')]).sum() Out[282]: value keytime a2018-08-0300:00:0030 2018-08-0300:05:00105 b2018-08-0300:00:0035 2018-08-0300:05:00110 c2018-08-0300:00:0040 2018-08-0300:05:00115
asfreq()
asfreq()进行频度转换。
>>>index=pd.date_range('1/1/2000',periods=4,freq='T') >>>series=pd.Series([0.0,None,2.0,3.0],index=index) >>>df=pd.DataFrame({'s':series}) >>>df s 2000-01-0100:00:000.0 2000-01-0100:01:00NaN 2000-01-0100:02:002.0 2000-01-0100:03:003.0
将频度转换为30s
>>>df.asfreq(freq='30S') s 2000-01-0100:00:000.0 2000-01-0100:00:30NaN 2000-01-0100:01:00NaN 2000-01-0100:01:30NaN 2000-01-0100:02:002.0 2000-01-0100:02:30NaN 2000-01-0100:03:003.0
将频度转换为2min,不会进行重采样(与resample的不同之处)
>>>df.asfreq(freq='2min') s 2000-01-0100:00:000.0 2000-01-0100:02:002.0
使用bfill()进行填充
>>>df.asfreq(freq='30S').bfill() s 2000-01-0100:00:000.0 2000-01-0100:00:30NaN 2000-01-0100:01:00NaN 2000-01-0100:01:302.0 2000-01-0100:02:002.0 2000-01-0100:02:303.0 2000-01-0100:03:003.0
以上这篇Pandas——resample()重采样和asfreq()频度转换方式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持毛票票。
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。