pandas 分组编号

2024-04-14 02:36:03 211

示例

对于以下DataFrame：

import numpy as np import pandas as pd np.random.seed(0) df = pd.DataFrame({'Age': np.random.randint(20, 70, 100), 'Sex': np.random.choice(['Male', 'Female'], 100), 'number_of_foo': np.random.randint(1, 20, 100)}) df.head() # Output: # Age Sex number_of_foo # 0 64 Female 14 # 1 67 Female 14 # 2 20 Female 12 # 3 23 Male 17 # 4 23 Female 15

组Age分为三类（或容器）。垃圾箱可以给定为

n指示bin数的整数—在这种情况下，数据帧的数据分为n相等大小的间隔

整数表示左侧开区间，其中数据被分成-例如的终点的顺序bins=[19,40,65,np.inf]创建三个年龄组(19,40]，(40,65]和(65,np.inf]。

熊猫自动将间隔的字符串版本分配为标签。也可以通过将labels参数定义为字符串列表来定义自己的标签。

pd.cut(df['Age'], bins=4)
# this creates four age groups: (19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]
Name: Age, dtype: category
Categories (4, object): [(19.951, 32.25] < (32.25, 44.5] < (44.5, 56.75] < (56.75, 69]]

pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
# this creates three age groups: (19, 40], (40, 65] and (65, infinity)
Name: Age, dtype: category
Categories (3, object): [(19, 40] < (40, 65] < (65, inf]]

用它groupby来获取foo的平均数：

age_groups = pd.cut(df['Age'], bins=[19, 40, 65, np.inf])
df.groupby(age_groups)['number_of_foo'].mean()
# Output: 
# Age
# (19, 40]     9.880000
# (40, 65]     9.452381
# (65, inf]    9.250000
# Name: number_of_foo, dtype: float64

交叉列出年龄段和性别：

pd.crosstab(age_groups, df['Sex'])
# Output: 
# Sex        Female  Male
# Age
# (19, 40]       22    28
# (40, 65]       18    24
# (65, inf]       3     5

pandas 分组编号

示例

热门推荐

随机推荐