如何删除在 R 数据框中具有三个或更少重复组合的分类列的行?
在数据分析中,我们有时会根据我们的想法决定数据的大小或样本大小,这可能会导致删除部分数据。一种这样的事情可能是删除三个或更少的分类列的重复组合,它可以在dplyr包的过滤器功能的帮助下通过group_by函数进行分组来完成。
示例1
考虑以下数据框-
set.seed(121)
x1<−sample(LETTERS[1:6],20,replace=TRUE)
x2<−sample(c("Male","Female"),20,replace=TRUE)
x3<−rpois(20,5)
df1<−data.frame(x1,x2,x3)
df1输出结果x1 x2 x3 1 D Female 5 2 D Female 2 3 D Male 7 4 D Female 8 5 A Male 6 6 C Female 7 7 A Female 3 8 C Female 1 9 C Female 7 10 E Male 2 11 D Female 3 12 E Female 6 13 F Female 3 14 D Female 4 15 A Male 4 16 E Male 4 17 B Female 8 18 B Female 7 19 C Female 5 20 A Female 9
加载dplyr包并删除具有三个或更少重复组合的分类列-
示例
library(dplyr) df1%>%group_by(x1,x2)%>%filter(n()>=4) # A tibble: 9 x 3 # Groups: x1, x2 [2]输出结果
x1 x2 x31 D Female 5 2 D Female 2 3 D Female 8 4 C Female 7 5 C Female 1 6 C Female 7 7 D Female 3 8 D Female 4 9 C Female 5
例2
y1<−sample(c("S1","S2","S3","S4","S5","S6"),20,replace=TRUE)
y2<−sample(c("Winter","Summer"),20,replace=TRUE)
y3<−rnorm(20,3)
df2<−data.frame(y1,y2,y3)
df2输出结果y1 y2 y3 1 S1 Winter 2.683082 2 S4 Summer 1.141916 3 S6 Winter 3.371681 4 S2 Winter 3.191187 5 S3 Summer 2.195504 6 S5 Summer 2.631736 7 S3 Winter 3.303605 8 S6 Summer 3.074344 9 S5 Summer 2.663724 10 S5 Winter 2.281991 11 S6 Summer 4.174418 12 S4 Winter 6.081246 13 S4 Summer 3.202913 14 S2 Winter 5.557243 15 S2 Winter 3.747462 16 S2 Winter 2.621571 17 S2 Summer 3.909743 18 S5 Winter 2.325663 19 S5 Summer 3.749852 20 S5 Winter 2.331191
示例
df2%>%group_by(y1,y2)%>%filter(n()>=4) # A tibble: 4 x 3 # Groups: y1, y2 [1]输出结果
y1 y2 y31 S2 Winter 3.19 2 S2 Winter 5.56 3 S2 Winter 3.75 4 S2 Winter 2.62
热门推荐
10 对患者生日祝福语简短
11 结婚祝福语简短装备
12 周岁祝福语学生文案简短
13 订婚领证祝福语简短精辟
14 导师获奖祝福语大全简短
15 新婚购房祝福语简短精辟
16 牛年祝福语简短的爱人
17 送芒果的祝福语简短
18 送给学长毕业祝福语简短