如何测试 R 数据框的两个分类列之间的显着关系?
为了测试R数据框的两个分类列之间比例的显着性,我们首先需要使用这些列找到列联表,然后使用chisq.test应用卡方检验进行独立性。例如,如果我们有一个名为df的数据框,它包含两个分类列,比如C1和C2,那么可以使用命令chisq.test(table(df$C1,df$C2))来完成显着关系的测试
示例
x1<-sample(LETTERS[1:4],20,replace=TRUE) y1<-sample(letters[1:4],20,replace=TRUE) df1<-data.frame(x1,y1) df1输出结果
x1 y1 1 D a 2 B d 3 D d 4 B d 5 A a 6 A b 7 B c 8 D d 9 C d 10 D c 11 C a 12 D c 13 D a 14 A a 15 B d 16 A c 17 C d 18 A d 19 C b 20 D a
示例
table(df1$x1,df1$y1)输出结果
a b c d A 2 1 1 1 B 0 0 1 3 C 1 1 0 2 D 3 0 2 2
找到df1的x1和y1列之间的显着关系-
示例
chisq.test(table(df1$x1,df1$y1))输出结果
Pearson's Chi-squared test data: table(df1$x1, df1$y1) X-squared = 7.4464, df = 9, p-value = 0.5907 Warning message: In chisq.test(table(df1$x1, df1$y1)) : Chi-squared approximation may be incorrect
示例
x2<-sample(c("hot","cold"),20,replace=TRUE) y2<-sample(c("summer","winter","spring"),20,replace=TRUE) df2<-data.frame(x2,y2) df2输出结果
x2 y2 1 cold winter 2 hot winter 3 hot winter 4 hot spring 5 cold summer 6 cold summer 7 cold spring 8 hot winter 9 cold summer 10 hot spring 11 hot winter 12 cold winter 13 cold winter 14 hot summer 15 hot winter 16 hot summer 17 hot summer 18 cold summer 19 cold spring 20 hot summer
示例
table(df2$x2,df2$y2)输出结果
spring summer winter cold 2 4 3 hot 2 4 5
找到df2的x2和y2列之间的显着关系-
示例
chisq.test(table(df2$x2,df2$y2))输出结果
Pearson's Chi-squared test data: table(df2$x2, df2$y2) X-squared = 0.30303, df = 2, p-value = 0.8594 Warning message: In chisq.test(table(df2$x2, df2$y2)) : Chi-squared approximation may be incorrect