pandas DataFrame 行列索引及值的获取的方法
pandasDataFrame是二维的,所以,它既有列索引,又有行索引
上一篇里只介绍了列索引:
importpandasaspd df=pd.DataFrame({'A':[0,1,2],'B':[3,4,5]}) printdf #结果: AB 003 114 225
行索引自动生成了0,1,2
如果要自己指定行索引和列索引,可以使用index和column参数:
这个数据是5个车站10天内的客流数据:
ridership_df=pd.DataFrame( data=[[0,0,2,5,0], [1478,3877,3674,2328,2539], [1613,4088,3991,6461,2691], [1560,3392,3826,4787,2613], [1608,4802,3932,4477,2705], [1576,3933,3909,4979,2685], [95,229,255,496,201], [2,0,1,27,0], [1438,3785,3589,4174,2215], [1342,4043,4009,4665,3033]], index=['05-01-11','05-02-11','05-03-11','05-04-11','05-05-11', '05-06-11','05-07-11','05-08-11','05-09-11','05-10-11'], columns=['R003','R004','R005','R006','R007'] )
data参数为一个numpy二维数组, index参数为行索引,column参数为列索引
生成的数据以表格形式显示:
R003R004R005R006R007 05-01-1100250 05-02-1114783877367423282539 05-03-1116134088399164612691 05-04-1115603392382647872613 05-05-1116084802393244772705 05-06-1115763933390949792685 05-07-1195229255496201 05-08-11201270 05-09-1114383785358941742215 05-10-1113424043400946653033
下面说下如何获取DataFrame里的值:
1.获取某一列:直接['key']
print(ridership_df['R003']) #结果: 05-01-110 05-02-111478 05-03-111613 05-04-111560 05-05-111608 05-06-111576 05-07-1195 05-08-112 05-09-111438 05-10-111342 Name:R003,dtype:int64
2.获取某一行: .loc['key']
print(ridership_df.loc['05-01-11']) #或者 print(ridership_df.iloc[0]) #结果: R0030 R0040 R0052 R0065 R0070 Name:05-01-11,dtype:int64
3.获取某一行某一列的某个值:
print(ridership_df.loc['05-05-11','R003']) #或者 print(ridership_df.iloc[4,0]) #结果: 1608
4.获取原始的numpy二维数组:
print(ridership_df.values) #结果: [[00250] [14783877367423282539] [16134088399164612691] [15603392382647872613] [16084802393244772705] [15763933390949792685] [95229255496201] [201270] [14383785358941742215] [13424043400946653033]]
*注意在这过程中,数据格式如果不一致,会发生转换.
一个综合栗子:
从ridership_df找出第一天里客流量最多的车站,然后返回这个车站的日平均客流,以及返回所有车站的平均日客流,作为对比:
defmean_riders_for_max_station(ridership): max_index=ridership.iloc[0].argmax() mean_for_max=ridership[max_index].mean() overall_mean=ridership.values.mean() return(overall_mean,mean_for_max) printmean_riders_for_max_station(ridership_df) #结果: (2342.6,3239.9)
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持毛票票。