如何将R数据帧中的字符串或分类变量的所有单词都转换为大写?
在大多数情况下,我们获取的数据格式不是我们想要的,因此,我们需要根据需要进行更改。当分类变量的级别由单词而不是数字表示时,我们可以将这些级别转换为小写或大写。有时,这样做只是为了使信息看起来对用户友好。通常,我们发现这些值是小写的,因此可以借助sapply函数将其转换为大写。
示例
请看以下数据帧-
> x1<-letters[1:20]
> x2<-20:1
> x3<-rep(c("india","china","usa","saudi arabia","jordan"),times=4)
> df<-data.frame(x1,x2,x3)
> df
x1 x2 x3
1 a 20 india
2 b 19 china
3 c 18 usa
4 d 17 saudi arabia
5 e 16 jordan
6 f 15 india
7 g 14 china
8 h 13 usa
9 i 12 saudi arabia
10 j 11 jordan
11 k 10 india
12 l 9 china
13 m 8 usa
14 n 7 saudi arabia
15 o 6 jordan
16 p 5 india
17 q 4 china
18 r 3 usa
19 s 2 saudi arabia
20 t 1 jordan
> df_new<-as.data.frame(sapply(df, toupper))
> df_new
x1 x2 x3
1 A 20 INDIA
2 B 19 CHINA
3 C 18 USA
4 D 17 SAUDI ARABIA
5 E 16 JORDAN
6 F 15 INDIA
7 G 14 CHINA
8 H 13 USA
9 I 12 SAUDI ARABIA
10 J 11 JORDAN
11 K 10 INDIA
12 L 9 CHINA
13 M 8 USA
14 N 7 SAUDI ARABIA
15 O 6 JORDAN
16 P 5 INDIA
17 Q 4 CHINA
18 R 3 USA
19 S 2 SAUDI ARABIA
20 T 1 JORDAN让我们再看一个示例,其中第二个变量的首字母大写-
> y1<-letters[26:7]
> y2<-rep(c("Statistics","Biology","Psychology","Marketing","Physics"),each=4)
> y3<-rep(c(2,4,6,8),times=5)
> df_y<-data.frame(y1,y2,y3)
> df_y
y1 y2 y3
1 z Statistics 2
2 y Statistics 4
3 x Statistics 6
4 w Statistics 8
5 v Biology 2
6 u Biology 4
7 t Biology 6
8 s Biology 8
9 r Psychology 2
10 q Psychology 4
11 p Psychology 6
12 o Psychology 8
13 n Marketing 2
14 m Marketing 4
15 l Marketing 6
16 k Marketing 8
17 j Physics 2
18 i Physics 4
19 h Physics 6
20 g Physics 8
> df_y_new<-as.data.frame(sapply(df_y, toupper))
> df_y_new
y1 y2 y3
1 Z STATISTICS 2
2 Y STATISTICS 4
3 X STATISTICS 6
4 W STATISTICS 8
5 V BIOLOGY 2
6 U BIOLOGY 4
7 T BIOLOGY 6
8 S BIOLOGY 8
9 R PSYCHOLOGY 2
10 Q PSYCHOLOGY 4
11 P PSYCHOLOGY 6
12 O PSYCHOLOGY 8
13 N MARKETING 2
14 M MARKETING 4
15 L MARKETING 6
16 K MARKETING 8
17 J PHYSICS 2
18 I PHYSICS 4
19 H PHYSICS 6
20 G PHYSICS 8