pandas数据拼接的实现示例
一前言
pandas数据拼接有可能会用到,比如出现重复数据,需要合并两份数据的交集,并集就是个不错的选择,知识追寻者本着技多不压身的态度蛮学习了一下下;
二数据拼接
在进行学习数据转换之前,先学习一些数拼接相关的知识
2.1join()联结
有关merge操作知识追寻者这边不提及,有空可能后面会专门出一篇相关文章,因为其学习方式根SQL的表联结类似,不是几行能说清楚的知识点;
join操作能将2个DataFrame合并为一块,前提是DataFrame之间的列没有重复;
#-*-coding:utf-8-*- importpandasaspd importnumpyasnp data1={ 'user':['zszxz','craler','rose'], 'price':[100,200,300], 'hobby':['reading','running','hiking'] } index1=['user1','user2','user3'] frame1=pd.DataFrame(data1,index1) data2={ 'person':['zszxz','craler','rose'], 'number':[100,2000,3000], 'activity':['swing','riding','climbing'] } index2=['user1','user2','user3'] frame2=pd.DataFrame(data2,index2) join=frame1.join(frame2) print(join)
输出
user price hobby person number activity
user1 zszxz 100 reading zszxz 100 swing
user2 craler 200 running craler 2000 riding
user3 rose 300 hiking rose 3000 climbing
2.2concat()拼接
使用concat()函数能将2个Series拼接为一个,默认按行拼接;
ser1=pd.Series(['111','222',np.NaN]) ser2=pd.Series(['333','444',np.NaN]) #默认按行拼接 print(pd.concat([ser1,ser2]))
如果按列拼接则axis=1
ser1=pd.Series(['111','222',np.NaN]) ser2=pd.Series(['333','444',np.NaN]) #按列拼接 print(pd.concat([ser1,ser2],axis=1))
输出
0 1
0 111 333
1 222 444
2 NaN NaN
更近一步,指定key参数输出的数据格式就和DataFrame一样
ser1=pd.Series(['111','222',np.NaN]) ser2=pd.Series(['333','444',np.NaN]) #按列拼接 data=pd.concat([ser1,ser2],axis=1,keys=['zszxz','rzxx']) print(data)
输出
zszxzrzxx
0 111 333
1 222 444
2 NaN NaN
注:DataFrame的concat操作和Series类似;
2.3combine_first()组合
索引重复时就可以使用combine_first进行拼接
ser1=pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2=pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data=ser1.combine_first(ser2) print(data)
输出
1 111
2 222
3 NaN
4 555
dtype:object
将Series位置互换一下,可以看见基准将以ser2为准;
ser1=pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2=pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data=ser2.combine_first(ser1) print(data)
输出
1 333
2 444
3 NaN
4 555
dtype:object
2.4轴转换
准备的数据
#-*-coding:utf-8-*- importpandasaspd importnumpyasnp data={ 'user':['zszxz','craler','rose'], 'price':[100,200,300], 'hobby':['reading','running','hiking'] } index=['user1','user2','user3'] frame=pd.DataFrame(data,index) print(frame)
输出
user price hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
stack()将列转为行;
#-*-coding:utf-8-*- importpandasaspd importnumpyasnp data={ 'user':['zszxz','craler','rose'], 'price':[100,200,300], 'hobby':['reading','running','hiking'] } index=['user1','user2','user3'] frame=pd.DataFrame(data,index) print(frame.stack())
输出
user1 user zszxz
price 100
hobby reading
user2 user craler
price 200
hobby running
user3 user rose
price 300
hobby hiking
dtype:object
使用unstack()将数据结构重新返回
#-*-coding:utf-8-*- importpandasaspd importnumpyasnp data={ 'user':['zszxz','craler','rose'], 'price':[100,200,300], 'hobby':['reading','running','hiking'] } index=['user1','user2','user3'] frame=pd.DataFrame(data,index) sta=frame.stack() print(sta.unstack())
输出
userprice hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
到此这篇关于pandas数据拼接的实现示例的文章就介绍到这了,更多相关pandas数据拼接内容请搜索毛票票以前的文章或继续浏览下面的相关文章希望大家以后多多支持毛票票!