如何在R中使用dplyr创建相对频率表?
相对频率是某物占总数的比例。例如,如果我们有5个香蕉,6个番石榴,10个石榴,那么香蕉的相对频率将是5除以5、6和10的总和(即21),因此也可以称为比例频率。
例1
请看以下数据帧-
set.seed(21) x<−sample(LETTERS[1:4],20,replace=TRUE) Ratings<−sample(1:50,20) df1<−data.frame(x,Ratings) df1
输出结果
x Ratings 1 C 44 2 A 29 3 C 14 4 A 10 5 B 46 6 C 1 7 D 47 8 A 8 9 C 23 10 C 7 11 D 50 12 B 31 13 B 34 14 B 3 15 D 48 16 B 33 17 C 45 18 B 9 19 B 40 20 C 21
加载dplyr软件包-
library(dplyr)
找出x中值的相对频率表-
df1%>%group_by(x)%>%summarise(n=n())%>%mutate(freq=n/sum(n)) `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 4 x 3
输出结果
x n freq <chr> <int> <dbl> 1 A 3 0.15 2 B 7 0.35 3 C 7 0.35 4 D 3 0.15 Warning message: `...` is not empty. We detected these problematic arguments: * `needs_dots` These dots only exist to allow future extensions and should be empty. Did you misspecify an argument?
注意-不要担心此警告消息,因为我们的问题已正确解决,并且警告与此无关。
例2
y<−sample(c("Male","Female"),20,replace=TRUE) Salary<−sample(20000:50000,20) df2<−data.frame(y,Salary) df2
输出结果
y Salary 1 Female 40907 2 Female 47697 3 Male 49419 4 Female 23818 5 Male 21585 6 Male 22276 7 Female 21856 8 Male 22092 9 Male 27892 10 Female 47655 11 Male 34933 12 Female 48027 13 Female 48179 14 Male 21460 15 Male 24233 16 Female 43762 17 Female 22369 18 Female 47206 19 Male 34972 20 Female 30222
在y中找到性别的相对频率-
df2%>%group_by(y)%>%summarise(n=n())%>%mutate(freq=n/sum(n)) `summarise()` ungrouping output (override with `.groups` argument) # A tibble: 2 x 3
输出结果
y n freq <chr> <int> <dbl> 1 Female 11 0.55 2 Male 9 0.45 Warning message: `...` is not empty. We detected these problematic arguments: * `needs_dots` These dots only exist to allow future extensions and should be empty. Did you misspecify an argument?