python dataframe向下向上填充,fillna和ffill的方法
首先新建一个dataframe:
In[8]:df=pd.DataFrame({'name':list('ABCDA'),'house':[1,1,2,3,3],'date':['2010-01-01','2010-06-09','2011-12-03','2011-04-05','2012-03-23']}) In[9]:df Out[9]: datehousename 02010-01-011A 12010-06-091B 22011-12-032C 32011-04-053D 42012-03-233A
将date列改为时间类型:
In[12]:df.date=pd.to_datetime(df.date)
数据的含义是这样的,我们有ABCD四个人的数据,已知A在2010-01-01的时候,名下有1套房,B在2010-06-09的时候,名下有1套房,C在2011-12-03的时候,有2套房,D在2011-04-05的时候有3套房,A在2012-02-23的时候,数据更新了,有两套房。
要求在有姓名和时间的情况下,能给出其名下有几套房:
比如A在2010-01-01与2012-03-23期间任意一天,都应该是1套房,在2012-03-23之后,都是3套房。
我们使用pandas的fillna方法,选择ffill。
首先我们获得一个2010-01-01到2017-12-01的dataframe
In[14]:time_range=pd.DataFrame( pd.date_range('2010-01-01','2017-12-01',freq='D'),columns=['date']).set_index("date") In[15]:time_range Out[15]: EmptyDataFrame Columns:[] Index:[2010-01-0100:00:00,2010-01-0200:00:00,2010-01-0300:00:00,2010-01-0400:00:00,2010-01-0500:00:00,2010-01-0600:00:00,2010-01-0700:00:00,2010-01-0800:00:00,2010-01-0900:00:00,2010-01-1000:00:00,2010-01-1100:00:00,2010-01-1200:00:00,2010-01-1300:00:00,2010-01-1400:00:00,2010-01-1500:00:00,2010-01-1600:00:00,2010-01-1700:00:00,2010-01-1800:00:00,2010-01-1900:00:00,2010-01-2000:00:00,2010-01-2100:00:00,2010-01-2200:00:00,2010-01-2300:00:00,2010-01-2400:00:00,2010-01-2500:00:00,2010-01-2600:00:00,2010-01-2700:00:00,2010-01-2800:00:00,2010-01-2900:00:00,2010-01-3000:00:00,2010-01-3100:00:00,2010-02-0100:00:00,2010-02-0200:00:00,2010-02-0300:00:00,2010-02-0400:00:00,2010-02-0500:00:00,2010-02-0600:00:00,2010-02-0700:00:00,2010-02-0800:00:00,2010-02-0900:00:00,2010-02-1000:00:00,2010-02-1100:00:00,2010-02-1200:00:00,2010-02-1300:00:00,2010-02-1400:00:00,2010-02-1500:00:00,2010-02-1600:00:00,2010-02-1700:00:00,2010-02-1800:00:00,2010-02-1900:00:00,2010-02-2000:00:00,2010-02-2100:00:00,2010-02-2200:00:00,2010-02-2300:00:00,2010-02-2400:00:00,2010-02-2500:00:00,2010-02-2600:00:00,2010-02-2700:00:00,2010-02-2800:00:00,2010-03-0100:00:00,2010-03-0200:00:00,2010-03-0300:00:00,2010-03-0400:00:00,2010-03-0500:00:00,2010-03-0600:00:00,2010-03-0700:00:00,2010-03-0800:00:00,2010-03-0900:00:00,2010-03-1000:00:00,2010-03-1100:00:00,2010-03-1200:00:00,2010-03-1300:00:00,2010-03-1400:00:00,2010-03-1500:00:00,2010-03-1600:00:00,2010-03-1700:00:00,2010-03-1800:00:00,2010-03-1900:00:00,2010-03-2000:00:00,2010-03-2100:00:00,2010-03-2200:00:00,2010-03-2300:00:00,2010-03-2400:00:00,2010-03-2500:00:00,2010-03-2600:00:00,2010-03-2700:00:00,2010-03-2800:00:00,2010-03-2900:00:00,2010-03-3000:00:00,2010-03-3100:00:00,2010-04-0100:00:00,2010-04-0200:00:00,2010-04-0300:00:00,2010-04-0400:00:00,2010-04-0500:00:00,2010-04-0600:00:00,2010-04-0700:00:00,2010-04-0800:00:00,2010-04-0900:00:00,2010-04-1000:00:00,...] [2892rowsx0columns]
然后用上上篇博客中提到的pivot_table将原本的df转变之后,与time_range进行merger操作。
In[16]:df=pd.pivot_table(df,columns='name',index='date') In[17]:df Out[17]: house nameABCD date 2010-01-011.0NaNNaNNaN 2010-06-09NaN1.0NaNNaN 2011-04-05NaNNaNNaN3.0 2011-12-03NaNNaN2.0NaN 2012-03-233.0NaNNaNNaN In[18]:df=df.merge(time_range,how="right",left_index=True,right_index=True)
然后再进行向下填充操作:
In[20]:df=df.fillna(method='ffill')
最后:
df=df.stack().reset_index()
结果太长,这里就不粘贴了。如果想向上填充,可选择method='bfill‘
以上这篇pythondataframe向下向上填充,fillna和ffill的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持毛票票。