如何基于R中的data.table的组列创建随机样本?
随机抽样有助于我们减少分析中的偏差。如果我们有分组数据,那么我们可能想根据分组找到随机样本。例如,如果我们有一个带有组变量的数据框,并且每个组包含十个值,那么我们可能想要创建一个随机样本,在该样本中,将从每个组中随机选择两个值。这可以通过使用.SD内的示例函数来完成
示例
考虑下面的data.table-
library(data.table) Group<-rep(c("A","B","C","D","E"),times=4) Percentage<-sample(1:100,20) dt1<-data.table(Group,Percentage) dt1
输出结果
Group Percentage 1: A 97 2: B 68 3: C 19 4: D 32 5: E 98 6: A 48 7: B 94 8: C 54 9: D 7 10: E 76 11: A 10 12: B 31 13: C 59 14: D 84 15: E 41 16: A 99 17: B 1 18: C 72 19: D 42 20: E 17
从每个组创建大小为2的随机样本-
示例
dt1[,.SD[sample(.N, min(2,.N))],by=Group]
输出结果
Group Percentage 1: A 48 2: A 99 3: B 94 4: B 31 5: C 54 6: C 59 7: D 42 8: D 84 9: E 98 10: E 76
让我们看另一个例子-
示例
Class<-rep(c("First","Second","Third","Fourth"),times=10) Experience<-sample(1:5,40,replace=TRUE) dt2<-data.table(Class,Experience) head(dt2,10)
输出结果
Class Experience 1: First 4 2: Second 2 3: Third 4 4: Fourth 2 5: First 4 6: Second 5 7: Third 3 8: Fourth 5 9: First 3 10: Second 5
示例
tail(dt2,10)
输出结果
Class Experience 1: Third 4 2: Fourth 2 3: First 5 4: Second 2 5: Third 1 6: Fourth 4 7: First 5 8: Second 2 9: Third 4 10: Fourth 4
示例
dt2[,.SD[sample(.N, min(5,.N))],by=Class]
输出结果
Class Experience 1: First 3 2: First 3 3: First 4 4: First 5 5: First 5 6: Second 5 7: Second 2 8: Second 5 9: Second 2 10: Second 1 11: Third 3 12: Third 1 13: Third 4 14: Third 3 15: Third 4 16: Fourth 2 17: Fourth 5 18: Fourth 2 19: Fourth 4 20: Fourth 2