如何对 R 数据框中少于四个类别的列进行子集化?
如果列是分类的,则至少可以有两个类别,类别总数没有限制,但也取决于案例总数。如果我们有一个数据框,其中包含一些类别多于或少于4的分类列,那么我们可能希望对少于四个类别的列进行子集化。在我们想要有偏见地对数据进行子集化或具有一些允许这种更改的预定义数据特征的情况下,这可能是必需的。可以在sapply函数的帮助下完成此类列的子集,如下例所示。
示例1
考虑以下数据框-
> x1<-sample(c("Hot","Cold","Warm"),20,replace=TRUE) > x2<-sample(c("Male","Female"),20,replace=TRUE) > x3<-sample(letters[1:4],20,replace=TRUE) > df1<-data.frame(x1,x2,x3) > df1输出结果
x1 x2 x3 1 Warm Male b 2 Cold Female c 3 Cold Male a 4 Hot Male d 5 Hot Male d 6 Hot Female a 7 Hot Male a 8 Cold Female d 9 Warm Male d 10 Warm Female d 11 Cold Male a 12 Cold Female c 13 Hot Male b 14 Warm Male c 15 Cold Male b 16 Warm Male a 17 Hot Male b 18 Cold Male b 19 Hot Female c 20 Warm Female d
在df1中查找少于4个类别的列的子集-
> df1[,sapply(df1, function(col) length(unique(col)))<4]输出结果
x1 x2 1 Warm Male 2 Cold Female 3 Cold Male 4 Hot Male 5 Hot Male 6 Hot Female 7 Hot Male 8 Cold Female 9 Warm Male 10 Warm Female 11 Cold Male 12 Cold Female 13 Hot Male 14 Warm Male 15 Cold Male 16 Warm Male 17 Hot Male 18 Cold Male 19 Hot Female 20 Warm Female
例2
> y1<-sample(c("Male","Female"),20,replace=TRUE) > y2<-sample(letters[1:5],20,replace=TRUE) > y3<-sample(c("Asian","American","Chinese"),20,replace=TRUE) > df2<-data.frame(y1,y2,y3) > df2输出结果
y1 y2 y3 1 Male b Chinese 2 Female b American 3 Female d Asian 4 Female e American 5 Female e Asian 6 Female c Chinese 7 Female a Chinese 8 Female a Chinese 9 Male d American 10 Female d Chinese 11 Female d Chinese 12 Female c American 13 Female b American 14 Male d Chinese 15 Male a American 16 Male e Asian 17 Male b Asian 18 Female d Chinese 19 Female d Chinese 20 Female c Asian
在df2中查找少于4个类别的列的子集-
> df2[,sapply(df2, function(col) length(unique(col)))<4]输出结果
y1 y3 1 Male Chinese 2 Female American 3 Female Asian 4 Female American 5 Female Asian 6 Female Chinese 7 Female Chinese 8 Female Chinese 9 Male American 10 Female Chinese 11 Female Chinese 12 Female American 13 Female American 14 Male Chinese 15 Male American 16 Male Asian 17 Male Asian 18 Female Chinese 19 Female Chinese 20 Female Asian