Pandas —— resample()重采样和asfreq()频度转换方式

2023-07-31 14:58:03 260

resample()

resample()进行重采样。

重采样（Resampling）指的是把时间序列的频度变为另一个频度的过程。把高频度的数据变为低频度叫做降采样（downsampling），把低频度变为高频度叫做增采样（upsampling）。

降采样

考虑因素：

各区间哪边是闭合的（参数：closed）

如何标记各聚合面元，用区间的开头还是末尾（参数：label）

In[232]:ts_index=pd.date_range('2018-08-03',periods=12,freq='T')

In[233]:ts=pd.Series(np.arange(12),index=ts_index)

In[234]:ts
Out[234]:
2018-08-0300:00:000
2018-08-0300:01:001
2018-08-0300:02:002
2018-08-0300:03:003
2018-08-0300:04:004
2018-08-0300:05:005
2018-08-0300:06:006
2018-08-0300:07:007
2018-08-0300:08:008
2018-08-0300:09:009
2018-08-0300:10:0010
2018-08-0300:11:0011
Freq:T,dtype:int32

默认使用左标签（label=‘left'），左闭合（closed='left'）

此时第一个区间为：2018-08-0300:00:00~2018-08-0300:04:59，故sum为10，label为：2018-08-0300:00:00

In[235]:ts.resample('5min').sum()
Out[235]:
2018-08-0300:00:0010
2018-08-0300:05:0035
2018-08-0300:10:0021
Freq:5T,dtype:int32

可以指定为右闭合（closed='right'），默认使用左标签（label=‘left'）

此时第一个区间为：2018-08-0223:55:01~2018-08-0300:00:00，故sum为0，label为：2018-08-0223:55:00

In[236]:ts.resample('5min',closed='right').sum()
Out[236]:
2018-08-0223:55:000
2018-08-0300:00:0015
2018-08-0300:05:0040
2018-08-0300:10:0011
Freq:5T,dtype:int32

可以指定为右闭合（closed='right'），右标签（label=‘right'）

此时第一个区间为：2018-08-0223:55:01~2018-08-0300:00:00，故sum为0，label为：2018-08-0300:00:00

In[237]:ts.resample('5min',closed='right',label='right').sum()
Out[237]:
2018-08-0300:00:000
2018-08-0300:05:0015
2018-08-0300:10:0040
2018-08-0300:15:0011
Freq:5T,dtype:int32

升采样

考虑因素：

没有聚合，但是需要填充

In[244]:frame=pd.DataFrame(np.random.randn(2,4),
...:index=pd.date_range('1/1/2000',periods=2,
...:freq='W-WED'),#freq='W-WED'表示按周
...:columns=['Colorado','Texas','NewYork','Ohio'])

In[245]:frame
Out[245]:
ColoradoTexasNewYorkOhio
2000-01-051.2017130.029819-1.366082-1.325252
2000-01-12-0.711291-1.0701331.4692720.809806

当我们对这个数据进行聚合的的时候，每个组只有一个值，以及gap（间隔）之间的缺失值。在不使用任何聚合函数的情况下，

我们使用asfreq方法将其转换为高频度：

In[246]:df_daily=frame.resample('D').asfreq()

In[247]:df_daily
Out[247]:
ColoradoTexasNewYorkOhio
2000-01-051.2017130.029819-1.366082-1.325252
2000-01-06NaNNaNNaNNaN
2000-01-07NaNNaNNaNNaN
2000-01-08NaNNaNNaNNaN
2000-01-09NaNNaNNaNNaN
2000-01-10NaNNaNNaNNaN
2000-01-11NaNNaNNaNNaN
2000-01-12-0.711291-1.0701331.4692720.809806

使用ffill()进行填充

In[248]:frame.resample('D').ffill()
Out[248]:
ColoradoTexasNewYorkOhio
2000-01-051.2017130.029819-1.366082-1.325252
2000-01-061.2017130.029819-1.366082-1.325252
2000-01-071.2017130.029819-1.366082-1.325252
2000-01-081.2017130.029819-1.366082-1.325252
2000-01-091.2017130.029819-1.366082-1.325252
2000-01-101.2017130.029819-1.366082-1.325252
2000-01-111.2017130.029819-1.366082-1.325252
2000-01-12-0.711291-1.0701331.4692720.809806

In[249]:frame.resample('D').ffill(limit=2)
Out[249]:
ColoradoTexasNewYorkOhio
2000-01-051.2017130.029819-1.366082-1.325252
2000-01-061.2017130.029819-1.366082-1.325252
2000-01-071.2017130.029819-1.366082-1.325252
2000-01-08NaNNaNNaNNaN
2000-01-09NaNNaNNaNNaN
2000-01-10NaNNaNNaNNaN
2000-01-11NaNNaNNaNNaN
2000-01-12-0.711291-1.0701331.4692720.809806

新的日期索引没必要跟旧的重叠

In[250]:frame.resample('W-THU').ffill()
Out[250]:
ColoradoTexasNewYorkOhio
2000-01-061.2017130.029819-1.366082-1.325252
2000-01-13-0.711291-1.0701331.4692720.809806

分组重采样

In[279]:times=pd.date_range('2018-08-300:00',freq='1min',periods=10)

In[280]:df2=pd.DataFrame({'time':times.repeat(3),
...:'key':np.tile(['a','b','c'],10),
...:'value':np.arange(30)})

In[281]:df2[:5]
Out[281]:
keytimevalue
0a2018-08-0300:00:000
1b2018-08-0300:00:001
2c2018-08-0300:00:002
3a2018-08-0300:01:003
4b2018-08-0300:01:004

In[282]:df2.groupby(['key',pd.Grouper(key='time',freq='5min')]).sum()
Out[282]:
value
keytime
a2018-08-0300:00:0030
2018-08-0300:05:00105
b2018-08-0300:00:0035
2018-08-0300:05:00110
c2018-08-0300:00:0040
2018-08-0300:05:00115

asfreq()

asfreq()进行频度转换。

>>>index=pd.date_range('1/1/2000',periods=4,freq='T')
>>>series=pd.Series([0.0,None,2.0,3.0],index=index)
>>>df=pd.DataFrame({'s':series})
>>>df
s
2000-01-0100:00:000.0
2000-01-0100:01:00NaN
2000-01-0100:02:002.0
2000-01-0100:03:003.0

将频度转换为30s

>>>df.asfreq(freq='30S')
s
2000-01-0100:00:000.0
2000-01-0100:00:30NaN
2000-01-0100:01:00NaN
2000-01-0100:01:30NaN
2000-01-0100:02:002.0
2000-01-0100:02:30NaN
2000-01-0100:03:003.0

将频度转换为2min，不会进行重采样（与resample的不同之处）

>>>df.asfreq(freq='2min')
s
2000-01-0100:00:000.0
2000-01-0100:02:002.0

使用bfill()进行填充

>>>df.asfreq(freq='30S').bfill()
s
2000-01-0100:00:000.0
2000-01-0100:00:30NaN
2000-01-0100:01:00NaN
2000-01-0100:01:302.0
2000-01-0100:02:002.0
2000-01-0100:02:303.0
2000-01-0100:03:003.0

以上这篇Pandas——resample()重采样和asfreq()频度转换方式就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持毛票票。

声明：本文内容来源于网络，版权归原作者所有，内容由互联网用户自发贡献自行上传，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任。如果您发现有涉嫌版权的内容，欢迎发送邮件至：czq8825#qq.com（发邮件时，请将#更换为@）进行举报，并提供相关证据，一经查实，本站将立刻删除涉嫌侵权内容。

Pandas —— resample()重采样和asfreq()频度转换方式

热门推荐

随机推荐