【问题标题】:Remove text before an array of subtexts删除一组子文本之前的文本
【发布时间】:2021-05-19 04:46:52
【问题描述】:

我有一组需要操作的字符串。每个,如果它们包含一组子字符串,我想保留子字符串,否则保持不变。

下面是一个例子:

keep <- c("USA","UNITED STATES")
keep <- paste0(paste0(" ",keep,"$"),collapse="|")

data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
expected_result <- c("DETROIT","USA","UNITED STATES")

【问题讨论】:

    标签: r regex substring str-replace gsub


    【解决方案1】:

    你可以使用

    data <- c("DETROIT","DETROIT USA","DETROIT UNITED STATES")
    keep <- c("USA","UNITED STATES")
    
    regex <- paste0(".*\\s*\\b(",paste0(keep,collapse="|"), ")\\b")
    sub(regex, "\\1", data)
    ## => [1] "DETROIT"       "USA"           "UNITED STATES"
    

    请参阅R demo online

    正则表达式是.*\s*\b(USA|UNITED STATES)\b,参见its online demo

    详情

    • .* - 尽可能多的零个或多个字符
    • \s* - 零个或多个空格
    • \b(USA|UNITED STATES)\b - 整个单词 USAUNITED STATES,被捕获到第 1 组(替换模式中的 \1)。

    【讨论】:

      【解决方案2】:

      如果存在,您可以使用str_extract 提取模式。这将返回NA,以防模式丢失,您可以将其替换为原始data

      keep <- c("USA","UNITED STATES")
      keep <- paste0(paste0(" ",keep,"$"),collapse="|")
      
      result <- stringr::str_extract(data, keep)
      result[is.na(result)] <- data[is.na(result)]
      trimws(result)
      #[1] "DETROIT"       "USA"           "UNITED STATES"
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2011-11-02
        • 1970-01-01
        • 2016-08-24
        • 1970-01-01
        相关资源
        最近更新 更多