【问题标题】:R strsplit: Split based on character except when a specific character followsR strsplit:基于字符拆分,除非后面跟着特定字符
【发布时间】:2016-03-25 03:26:19
【问题描述】:

假设我有一个字符串向量,例如

split_these = c("File Location:C:\\Documents","File Location:Pete's Computer","File Location:") 

我想根据“:”拆分此向量中的每个元素,除非后面有“\”。我想要的是返回类似

的东西
#preferred solution
"File Location" "C:\\Documents"
"File Location" "Pete's Computer"
"File Location" ""

#less preferred but still great
"File Location" "C:\\Documents"
"File Location" "Pete's Computer"
"File Location" 

我已经尝试了以下

strsplit(split_these, ":")
[[1]]
[1] "File Location" "C"             "\\Documents"  

[[2]]
[1] "File Location" "Pete Computer"

[[3]]
[1] "File Location"

strsplit(split_these, ":[^\\]")
[[1]]
[1] "File Location" ":\\Documents" 

[[2]]
[1] "File Location" "ete Computer" 

[[3]]
[1] "File Location:"

【问题讨论】:

    标签: regex r string split


    【解决方案1】:

    我建议使用带有否定前瞻断言的 PCRE。另请注意,您需要对反斜杠进行双重转义,因为它在 R 字符串和正则表达式语法中都用作元字符。

    strsplit(perl=T,split_these,':(?!\\\\)');
    ## [[1]]
    ## [1] "File Location" "C:\\Documents"
    ##
    ## [[2]]
    ## [1] "File Location"   "Pete's Computer"
    ##
    ## [[3]]
    ## [1] "File Location"
    

    如果要将列表简化为单个字符向量:

    do.call(c,strsplit(perl=T,split_these,':(?!\\\\)'));
    ## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location"
    

    我想出了一个技巧来获取尾随的空字符串字段。由于strsplit() 总是省略最终的空字段,我们可以简单地将分隔符连接到每个输入字符串的末尾。如果原始字符串中没有尾随分隔符,则将省略新的空字段,而不更改结果。如果原始字符串中有 个尾随分隔符,那么我们将得到我们想要的空字段:

    do.call(c,strsplit(perl=T,paste0(split_these,':'),':(?!\\\\)'));
    ## [1] "File Location" "C:\\Documents" "File Location" "Pete's Computer" "File Location" ""
    

    【讨论】:

      【解决方案2】:

      read.dcf 迭代split_these 的元素会得到一个命名的字符向量,可以将其重新构造为data.frame:

      v <- drop(do.call("cbind", lapply(split_these, function(x) read.dcf(textConnection(x)))))
      

      给予:

      > v
          File Location     File Location     File Location 
        "C:\\Documents" "Pete's Computer"                "" 
      

      > stack(v)[2:1]
      

      给予:

                  ind          values
      1 File Location   C:\\Documents
      2 File Location Pete's Computer
      3 File Location
      

      【讨论】:

        猜你喜欢
        • 2019-01-06
        • 2013-03-04
        • 1970-01-01
        • 2021-12-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多