【问题标题】:How to split a character vector based on a numeric vector for positions如何根据位置的数值向量拆分字符向量
【发布时间】:2017-05-30 12:42:22
【问题描述】:

我想根据分割点的第二个数值向量将字符向量分割成子字符串

vec <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
split.points <- c(25, 32, 55, 90, 124)

我想将上面split.points向量中给定位置的字符向量切割成六个不同的子串。

听起来很简单,但我知道的split 命令只能用于特定的正则表达式(模式)或设置长度的子字符串。

我将不胜感激。

【问题讨论】:

    标签: r split strsplit


    【解决方案1】:

    我们可以试试substring:

    substring(
        vec,
        c(1, split.points + 1),
        c(split.points, nchar(vec))
    )
    # [1] "LAYRVCMTNEGHPWVSLVVQKTRLQ"                    "ISQDPSL"                                     
    # [3] "NYEYLPTMGLKSFIQASLALLFG"                      "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"         
    # [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"           "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
    

    【讨论】:

      【解决方案2】:

      另一种选择是使用read.fwf

      unlist(read.fwf(textConnection(vec), 
                      widths = c(25, diff(split.points)), 
                      as.is = TRUE), 
             use.names = FALSE)
      

      给出:

      [1] "LAYRVCMTNEGHPWVSLVVQKTRLQ"          
      [2] "ISQDPSL"                            
      [3] "NYEYLPTMGLKSFIQASLALLFG"            
      [4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"
      [5] "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
      

      当您的字符向量源自数据文件时,我不会感到惊讶。在这种情况下,read.fwf 将特别有用。一个例子:

      vec2 <- "LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM
      LAYRVCMTNEGHPWVSLVVQKTRLQISQDPSLNYEYLPTMGLKSFIQASLALLFGKHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHKDARIVYIISSQKELHGLVFQDMGFTVYEYSVWDPKKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
      
      read.fwf(textConnection(vec2), 
               widths = c(25, diff(split.points)), 
               as.is=TRUE)
      

      这将给出:

                               V1      V2                      V3                                  V4                                 V5
      1 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
      2 LAYRVCMTNEGHPWVSLVVQKTRLQ ISQDPSL NYEYLPTMGLKSFIQASLALLFG KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP
      

      【讨论】:

        【解决方案3】:

        我们可以从tidyr使用separate

        library(tidyverse)
        data_frame(vec) %>%
              separate(vec, into = paste0('vec', 1:6), sep = split.points) %>% 
              unlist(., use.names = FALSE)
        #[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ"                    "ISQDPSL"                                      "NYEYLPTMGLKSFIQASLALLFG"                     
        #[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"          "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"
        #[6] "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
        

        base R 选项将是 substr

        unname(mapply(substr, vec, start = c(1, split.points+1), stop = c(split.points, nchar(vec))))
        #[1] "LAYRVCMTNEGHPWVSLVVQKTRLQ"                    "ISQDPSL"                                      "NYEYLPTMGLKSFIQASLALLFG"                     
        #[4] "KHSQAIVENRVGGVHTVGDSGAFQLGVQFLRAWHK"          "DARIVYIISSQKELHGLVFQDMGFTVYEYSVWDP"           "KKLCMDPDILLNVVEQIPHGCVLVMGNIIDCKLTPSGWAKLMSM"
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2021-10-03
          • 1970-01-01
          • 1970-01-01
          • 2020-06-01
          • 2013-04-27
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多