采样R数据帧后如何更改行索引?
当我们从R数据帧中抽取随机样本时,样本行的行号与原始数据帧中一样,显然是由于随机化而发生的。但这可能会在进行分析时造成混乱,尤其是在需要使用行的情况下,因此,我们可以将行的索引数从1转换为所选样本中的行数。
示例
请看以下数据帧-
> set.seed(111) > x1<-rnorm(20,1.5) > x2<-rnorm(20,2.5) > x3<-rnorm(20,3) > df1<-data.frame(x1,x2,x3) > df1
输出结果
x1 x2 x3 1 1.735220712 2.8616625 1.824274 2 1.169264128 2.8469644 1.878784 3 1.188376176 2.6897365 1.638096 4 -0.802345658 2.3404232 3.481125 5 1.329123955 2.8265492 3.741972 6 1.640278225 3.0982542 3.027825 7 0.002573344 0.6584657 3.331380 8 0.489811581 5.2180556 3.644114 9 0.551524395 2.6912444 5.485662 10 1.006037783 1.1987039 4.959982 11 1.326325872 -0.6132173 3.191663 12 1.093401220 1.5586426 4.552544 13 3.345636264 3.9002588 3.914242 14 1.894054110 0.8795300 3.358625 15 2.297528501 0.2340040 3.175096 16 -0.066665360 3.6629936 2.152732 17 1.414148991 2.3838450 3.978232 18 1.140860519 2.8342560 4.805868 19 0.306391033 1.8791419 3.122915 20 1.864186737 1.1901551 2.870228
从df1创建大小为5的样本-
> df1_sample<-df1[sample(nrow(df1),5),] > df1_sample
输出结果
x1 x2 x3 18 1.140861 2.834256 4.805868 6 1.640278 3.098254 3.027825 13 3.345636 3.900259 3.914242 5 1.329124 2.826549 3.741972 15 2.297529 0.234004 3.175096
重命名样本中的行的索引数-
> rownames(df1_sample)<-1:nrow(df1_sample) > df1_sample
输出结果
x1 x2 x3 1 1.140861 2.834256 4.805868 2 1.640278 3.098254 3.027825 3 3.345636 3.900259 3.914242 4 1.329124 2.826549 3.741972 5 2.297529 0.234004 3.175096
让我们看另一个例子-
示例
> y1<-runif(20,2,5) > y2<-runif(20,3,5) > y3<-runif(20,5,10) > y4<-runif(20,5,12) > df2<-data.frame(y1,y2,y3,y4) > df2
输出结果
y1 y2 y3 y4 1 2.881213 4.894022 7.797367 6.487594 2 3.052896 3.223898 7.527572 6.695535 3 2.237543 4.127740 9.864026 8.754048 4 4.475907 4.696651 5.403004 6.239423 5 2.792642 4.023536 7.786222 8.992823 6 2.791539 4.333093 9.480036 6.087904 7 2.271143 3.053019 5.539486 8.320935 8 3.382534 3.212921 7.246406 10.091843 9 4.074728 4.390884 6.544056 10.924127 10 4.546881 3.546689 6.164413 11.710035 11 2.738344 4.489939 9.140333 8.211822 12 3.952763 4.490791 5.564392 7.542578 13 4.040586 3.333465 9.420011 11.554599 14 2.313604 4.959709 8.628101 11.193405 15 2.335957 4.189517 9.601667 9.694433 16 2.646964 4.376438 5.614787 10.929413 17 2.390349 3.343716 9.755718 11.017555 18 3.999001 3.083366 8.348515 8.370818 19 3.463324 3.379700 5.425484 7.219430 20 3.059911 4.522844 7.905784 11.420429
> df2_sample<-df2[sample(nrow(df2),7),] > df2_sample
输出结果
y1 y2 y3 y4 20 3.059911 4.522844 7.905784 11.420429 3 2.237543 4.127740 9.864026 8.754048 10 4.546881 3.546689 6.164413 11.710035 12 3.952763 4.490791 5.564392 7.542578 15 2.335957 4.189517 9.601667 9.694433 18 3.999001 3.083366 8.348515 8.370818 5 2.792642 4.023536 7.786222 8.992823
> rownames(df2_sample)<-1:nrow(df2_sample) > df2_sample
输出结果
y1 y2 y3 y4 1 3.059911 4.522844 7.905784 11.420429 2 2.237543 4.127740 9.864026 8.754048 3 4.546881 3.546689 6.164413 11.710035 4 3.952763 4.490791 5.564392 7.542578 5 2.335957 4.189517 9.601667 9.694433 6 3.999001 3.083366 8.348515 8.370818 7 2.792642 4.023536 7.786222 8.992823