【问题标题】:How can I search multiple words in the same regex?如何在同一个正则表达式中搜索多个单词?
【发布时间】:2018-08-04 15:33:47
【问题描述】:

我有一个特定单词列表来删除句子列表。我是否必须遍历列表并将函数应用于每个正则表达式,或者我可以以某种方式一次调用它们?我试图用 lapply 这样做,但我希望能找到更好的方法。

 string <- 'This is a sample sentence from which to gather some cool 
 knowledge'

 words <- c('a','from','some')

lapply(words,function(x){
  string <- gsub(paste0('\\b',words,'\\b'),'',string)
})

我想要的输出是: This is sample sentence which to gather cool knowledge.

【问题讨论】:

    标签: r regex lapply gsub


    【解决方案1】:

    您可以使用正则表达式 OR 运算符 ("|") 折叠要删除的单词的字符向量,有时也称为“管道”符号。

    gsub(paste0('\\b',words,'\\b', collapse="|"), '', string)
    [1] "This is  sample sentence  which to gather  cool \n knowledge"
    

    或者:

    gsub(paste0('\\b',words,'\\b\\s{0,1}', collapse="|"), '', string)
    [1] "This is sample sentence which to gather cool \n knowledge"
    

    【讨论】:

    • +1 但我什至会使用 gsub(paste0('\\b',words,'\\b\\s*', collapse="|"),'',string) 来删除空格,这样你最终会得到 [1] "This is sample sentence which to gather cool \n knowledge"
    • 同意可能希望删除尾随空格,但我可能最多只删除一个带有“\\s{0,1}”的空格
    【解决方案2】:

    你需要使用"|"在正则表达式中使用或:

    string2 <- gsub(paste(words,'|',collapse =""),'',string)
    
    > string2
    [1] "This is sample sentence which to gather cool knowledge"
    

    【讨论】:

      【解决方案3】:
      string<-'This is a sample sentence from which to gather some cool knowledge'
      words<-c('a', 'from', 'some')
      library(tm)
      string<-removeWords(string, words = words)
      string
      [1] "This is  sample sentence  which to gather  cool knowledge"
      

      通过 tm 库,您可以使用 removeWords()

      或者你可以像这样使用 gsub 循环:

      string<-'This is a sample sentence from which to gather some cool knowledge'
      words<-c('a', 'from', 'some')
      for(i in 1:length(words)) {
        string<-gsub(pattern = words[i], replacement = '', x = string)
      }
      string
      [1] "This is  sample sentence  which to gather  cool knowledge"
      

      希望有帮助。

      【讨论】:

        猜你喜欢
        • 2022-12-05
        • 1970-01-01
        • 2015-03-16
        • 1970-01-01
        • 1970-01-01
        • 2012-06-26
        • 1970-01-01
        • 1970-01-01
        • 2022-10-19
        相关资源
        最近更新 更多