如何在R中将长字符串拆分成大小相等的子字符串向量?
如果向量被错误地记录为单个字符串,或者包含数据的文件没有以适当的方式分隔字符串,那么我们可能需要以正确的形式进行拆分,以便我们可以进行进一步的分析。当具有相同名称长度的因子变量的级别没有分开时,可能会发生这种情况。在这种情况下,我们可以使用子字符串功能将字符串拆分为一个包含大小相等的子字符串的向量。
例子
只需查看这些示例,即可了解子字符串功能如何帮助我们将字符串拆分为子字符串向量-
Factor<-"aabbccddabacadbabcbdcacbcddadbdc" substring(Factor,seq(1,nchar(Factor),2),seq(2,nchar(Factor), 2))
输出结果
[1] "aa" "bb" "cc" "dd" "ab" "ac" "ad" "ba" "bc" "bd" "ca" "cb" "cd" "da" "db" [16] "dc" x1<-"abcdefghijklmopqrstuvwxyz" substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1), 2)) [1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" "" substring(x1,seq(1,nchar(x1),2),seq(3,nchar(x1), 2)) [1] "abc" "cde" "efg" "ghi" "ijk" "klm" "mop" "pqr" "rst" "tuv" "vwx" "xyz" [13] "" substring(x1,seq(1,nchar(x1),3),seq(3,nchar(x1), 3)) [1] "abc" "def" "ghi" "jkl" "mop" "qrs" "tuv" "wxy" "" substring(x1,seq(1,nchar(x1),4),seq(3,nchar(x1), 4)) [1] "abc" "efg" "ijk" "mop" "rst" "vwx" "" substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1), 4)) [1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" "" substring(x1,seq(1,nchar(x1),4),seq(5,nchar(x1), 4)) [1] "abcde" "efghi" "ijklm" "mopqr" "rstuv" "vwxyz" "" substring(x1,seq(1,nchar(x1),5),seq(5,nchar(x1), 5)) [1] "abcde" "fghij" "klmop" "qrstu" "vwxyz" substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10)) [1] "abcde" "klmop" "vwxyz" substring(x1,seq(1,nchar(x1),10),seq(10,nchar(x1), 10)) [1] "abcdefghij" "klmopqrstu" "" substring(x1,seq(1,nchar(x1),10),seq(2,nchar(x1), 10)) [1] "ab" "kl" "vw" substring(x1,seq(1,nchar(x1),10),seq(3,nchar(x1), 10)) [1] "abc" "klm" "vwx" substring(x1,seq(1,nchar(x1),10),seq(5,nchar(x1), 10)) [1] "abcde" "klmop" "vwxyz" substring(x1,seq(1,nchar(x1),2),seq(2,nchar(x1)+2-1, 2)) [1] "ab" "cd" "ef" "gh" "ij" "kl" "mo" "pq" "rs" "tu" "vw" "xy" "z" substring(x1,seq(1,nchar(x1),4),seq(4,nchar(x1)+4-1, 4)) [1] "abcd" "efgh" "ijkl" "mopq" "rstu" "vwxy" "z" substring(x1,seq(1,nchar(x1),3),seq(4,nchar(x1)+4-1, 3)) [1] "abcd" "defg" "ghij" "jklm" "mopq" "qrst" "tuvw" "wxyz" "z" substring(x1,seq(1,nchar(x1),5),seq(4,nchar(x1)+4-1, 5)) [1] "abcd" "fghi" "klmo" "qrst" "vwxy" substring(x1,seq(1,nchar(x1),2),seq(4,nchar(x1)+4-1, 2)) [1] "abcd" "cdef" "efgh" "ghij" "ijkl" "klmo" "mopq" "pqrs" "rstu" "tuvw" [11] "vwxy" "xyz" "z" substring(x1,seq(1,nchar(x1),2),seq(5,nchar(x1)+5-1, 2)) [1] "abcde" "cdefg" "efghi" "ghijk" "ijklm" "klmop" "mopqr" "pqrst" "rstuv" [10] "tuvwx" "vwxyz" "xyz" "z"