【问题标题】:tm package removeWords function concatenate words in Rtm包removeWords函数连接R中的单词
【发布时间】:2021-09-27 18:02:42
【问题描述】:

我正在使用 tm 包中的 removewords 清理样本数据,但 removeWords 函数会连接删除后的单词。应该是“环保死蛙”“环保死老鼠”。有人可以指导吗?

library(tm)
dc<-c("environmental dead frog still","environmental dead mouse come")

manualremovelist<-c("the","does","doesn't","please","new","ok","one","cant",
                "doesnt","can","still","done","will","without","seen",
                "also","danfoss","case","doesn´t","due","need","occurs","made",
                "using","now","make","makes","needs","put","okay","sno","since","therefore",
                "found","milwaukee","probably","got","finally","isnt","per","two",
                "obvious","unable","must","nos","3nos","1no",".","phone","tel","attached",
                "given","find","have","see","be","give","do","come","use","make","get",
                "try","call","request")

dc<-removeWords(dc,manualremovelist)

"environmentaldeadfrog"  "environmentaldeadmouse"

【问题讨论】:

    标签: r nlp tm


    【解决方案1】:

    removeWords 仅适用于文字。您可以将字符串拆分为单词并在单个短语/句子上使用removeWords

    library(tm)
    
    dc  <- sapply(strsplit(dc, '\\s+'), function(x) 
            trimws(paste0(removeWords(x, manualremovelist), collapse = ' ')))
    
    dc
    
    #[1] "environmental dead frog"  "environmental dead mouse"
    

    【讨论】:

      猜你喜欢
      • 2015-11-20
      • 1970-01-01
      • 1970-01-01
      • 2014-07-22
      • 2020-04-21
      • 1970-01-01
      • 2013-07-07
      • 2017-02-13
      • 2012-02-12
      相关资源
      最近更新 更多