如何用 R 中的第 5 个和第 95 个百分位值替换异常值?
有很多方法可以定义一个外围值,它可以由研究人员和技术人员手动设置。此外,我们可以将第5个百分位数用于下离群值,将第95个百分位数用于上离群值。为此,我们可以使用scales包的squish函数,如下例所示。
示例1
library(scales) x1<−1:10 x1<−squish(x1,quantile(x1,c(.05,0.95))) x1输出结果
[1] 1.45 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 9.55
例2
x2<−c(−5,rnorm(78),5) x2输出结果
[1] −5.00000000 −0.39993096 −0.11249038 1.06589235 1.17195813 0.15677178 [7] −0.08325310 0.57986817 −0.05529031 0.13352083 1.00608625 −0.86860404 [13] 0.53672576 −0.15262216 −0.81247587 −0.31263625 −1.51127713 −1.59689010 [19] −0.11242962 −1.08234352 −0.04935398 −0.65185804 −1.10369370 0.68732306 [25] 1.83448401 1.08689945 −1.20674408 −1.25753553 0.03354570 0.67981025 [31] 0.24871123 −1.49969111 1.19287825 1.04406030 −1.31756416 0.10204579 [37] 1.48272096 0.97661717 0.50006441 −1.36247153 0.99895292 −0.49534106 [43] −0.24105508 0.35006991 −2.16041158 −1.12644863 2.23190981 −0.51413222 [49] 0.03760280 −1.12237961 −1.54094088 −0.37365780 0.02138277 1.97702046 [55] 0.37190626 −0.59456892 −0.06652980 −1.04453387 −0.50884324 0.85025142 [61] −0.66718350 −0.69703588 0.44922344 0.64238500 −1.11403189 0.66251032 [67] 0.79601219 −0.74801795 −0.10957126 −0.90781918 −2.13721781 1.43186180 [73] −0.32571115 −0.97929747 1.10822193 0.94719910 0.58934102 −1.29942407 [79] 3.83469537 5.00000000
示例
x2<−squish(x2,quantile(x2,c(.05,0.95))) x2输出结果
[1] −1.54373835 −0.39993096 −0.11249038 1.06589235 1.17195813 0.15677178 [7] −0.08325310 0.57986817 −0.05529031 0.13352083 1.00608625 −0.86860404 [13] 0.53672576 −0.15262216 −0.81247587 −0.31263625 −1.51127713 −1.54373835 [19] −0.11242962 −1.08234352 −0.04935398 −0.65185804 −1.10369370 0.68732306 [25] 1.83448401 1.08689945 −1.20674408 −1.25753553 0.03354570 0.67981025 [31] 0.24871123 −1.49969111 1.19287825 1.04406030 −1.31756416 0.10204579 [37] 1.48272096 0.97661717 0.50006441 −1.36247153 0.99895292 −0.49534106 [43] −0.24105508 0.35006991 −1.54373835 −1.12644863 1.84161083 −0.51413222 [49] 0.03760280 −1.12237961 −1.54094088 −0.37365780 0.02138277 1.84161083 [55] 0.37190626 −0.59456892 −0.06652980 −1.04453387 −0.50884324 0.85025142 [61] −0.66718350 −0.69703588 0.44922344 0.64238500 −1.11403189 0.66251032 [67] 0.79601219 −0.74801795 −0.10957126 −0.90781918 −1.54373835 1.43186180 [73] −0.32571115 −0.97929747 1.10822193 0.94719910 0.58934102 −1.29942407 [79] 1.84161083 1.84161083
例3
x3<−c(-50,rpois(198,5),50) x3输出结果
[1] −50 5 4 8 6 2 1 6 3 5 7 7 8 5 8 8 5 8 [19] 3 2 3 0 5 6 2 6 6 2 7 5 9 4 5 3 9 7 [37] 4 3 6 5 2 4 9 5 7 1 2 4 2 3 5 5 6 1 [55] 5 7 1 9 6 3 5 4 3 9 5 4 6 8 4 4 6 4 [73] 5 2 4 5 5 7 8 6 3 5 8 5 8 5 2 5 2 8 [91] 6 6 5 7 2 2 5 5 4 3 5 3 7 2 4 6 8 6 [109] 3 4 9 2 2 2 4 4 6 6 5 5 3 5 3 6 6 4 [127] 6 4 4 5 9 6 2 1 3 8 5 7 5 6 6 5 7 2 [145] 8 8 6 5 3 4 5 10 6 6 3 6 2 7 7 5 8 7 [163] 7 3 4 8 4 4 6 8 3 6 4 10 4 3 5 4 4 5 [181] 4 5 4 5 4 5 6 8 2 5 12 12 3 6 5 4 4 5 [199] 5 50
示例
x3<−squish(x3,quantile(x3,c(.05,0.95))) x3输出结果
[1] 2 5 4 8 6 2 2 6 3 5 7 7 8 5 8 8 5 8 3 2 3 2 5 6 2 6 6 2 7 5 9 4 5 3 9 7 4 [38] 3 6 5 2 4 9 5 7 2 2 4 2 3 5 5 6 2 5 7 2 9 6 3 5 4 3 9 5 4 6 8 4 4 6 4 5 2 [75] 4 5 5 7 8 6 3 5 8 5 8 5 2 5 2 8 6 6 5 7 2 2 5 5 4 3 5 3 7 2 4 6 8 6 3 4 9 [112] 2 2 2 4 4 6 6 5 5 3 5 3 6 6 4 6 4 4 5 9 6 2 2 3 8 5 7 5 6 6 5 7 2 8 8 6 5 [149] 3 4 5 9 6 6 3 6 2 7 7 5 8 7 7 3 4 8 4 4 6 8 3 6 4 9 4 3 5 4 4 5 4 5 4 5 4 [186] 5 6 8 2 5 9 9 3 6 5 4 4 5 5 9
例4
x4<−c(−50,rexp(48,3.1),50) x4输出结果
[1] −50.00000000 0.46067329 0.15298747 0.22637363 0.23424447 [6] 0.15467335 0.37455989 0.07762013 0.33175821 0.09303333 [11] 0.03806199 0.20649621 0.22883480 0.49089164 0.82497712 [16] 0.04780089 0.05156566 0.35638257 0.37319578 0.71100713 [21] 0.08649528 0.31543159 0.02263685 0.00963146 0.44814049 [26] 0.34506738 0.29533295 0.13803055 0.05497129 0.03901786 [31] 0.01818446 0.78122217 0.04863415 0.33353520 0.39530353 [36] 0.05385106 0.19991695 0.16913554 0.01549729 0.15901185 [41] 0.65120205 0.36483214 0.18226180 0.20708671 0.01590697 [46] 1.01257680 0.42223292 0.17291614 0.15793390 50.00000000
示例
x4<−squish(x4,quantile(x4,c(.05,0.95))) x4输出结果
[1] 0.01568165 0.46067329 0.15298747 0.22637363 0.23424447 0.15467335 [7] 0.37455989 0.07762013 0.33175821 0.09303333 0.03806199 0.20649621 [13] 0.22883480 0.49089164 0.80528739 0.04780089 0.05156566 0.35638257 [19] 0.37319578 0.71100713 0.08649528 0.31543159 0.02263685 0.01568165 [25] 0.44814049 0.34506738 0.29533295 0.13803055 0.05497129 0.03901786 [31] 0.01818446 0.78122217 0.04863415 0.33353520 0.39530353 0.05385106 [37] 0.19991695 0.16913554 0.01568165 0.15901185 0.65120205 0.36483214 [43] 0.18226180 0.20708671 0.01590697 0.80528739 0.42223292 0.17291614 [49] 0.15793390 0.80528739