Python - 如何按月对 Pandas DataFrame 进行分组?
我们将使用groupby对PandasDataFrame进行分组。使用grouper功能选择要使用的列。对于下面显示的汽车销售记录示例,我们将按月分组并计算每月注册价格的总和。
首先,假设以下是我们的三列PandasDataFrame-
dataFrame = pd.DataFrame( { "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"], "Date_of_Purchase": [ pd.Timestamp("2021-06-10"), pd.Timestamp("2021-07-11"), pd.Timestamp("2021-06-25"), pd.Timestamp("2021-06-29"), pd.Timestamp("2021-03-20"), pd.Timestamp("2021-01-22"), pd.Timestamp("2021-01-06"), pd.Timestamp("2021-01-04"), pd.Timestamp("2021-05-09") ], "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350] } )
使用Grouper选择groupby()函数内的Date_of_Purchase列。频率freq设置为“M”以按月分组-
print("\nGroup Dataframe by month...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum())
示例
以下是代码-
import pandas as pd #dataframewithoneofthecolumnsasDate_of_Purchase dataFrame = pd.DataFrame( { "Car": ["Audi", "Lexus", "Tesla", "Mercedes", "BMW", "Toyota", "Nissan", "Bentley", "Mustang"], "Date_of_Purchase": [ pd.Timestamp("2021-06-10"), pd.Timestamp("2021-07-11"), pd.Timestamp("2021-06-25"), pd.Timestamp("2021-06-29"), pd.Timestamp("2021-03-20"), pd.Timestamp("2021-01-22"), pd.Timestamp("2021-01-06"), pd.Timestamp("2021-01-04"), pd.Timestamp("2021-05-09") ], "Reg_Price": [1000, 1400, 1100, 900, 1700, 1800, 1300, 1150, 1350] } ) print"DataFrame...\n",dataFrame #GroupertoselectDate_of_Purchasecolumnwithingroupbyfunction print"\nGroup Dataframe by month...\n",dataFrame.groupby(pd.Grouper(key='Date_of_Purchase', axis=0, freq='M')).sum()输出结果
这将产生以下输出。计算每个月的注册价格-
DataFrame... Car Date_of_Purchase Reg_Price 0 Audi 2021-06-10 1000 1 Lexus 2021-07-11 1400 2 Tesla 2021-06-25 1100 3 Mercedes 2021-06-29 900 4 BMW 2021-03-20 1700 5 Toyota 2021-01-22 1800 6 Nissan 2021-01-06 1300 7 Bentley 2021-01-04 1150 8 Mustang 2021-05-09 1350 Group Dataframe by month... Reg_Price Date_of_Purchase 2021-01-31 4250.0 2021-02-28 NaN 2021-03-31 1700.0 2021-04-30 NaN 2021-05-31 1350.0 2021-06-30 3000.0 2021-07-31 1400.0