【问题标题】:R: how to use str_replace_all( ) without regular expressionR:如何在没有正则表达式的情况下使用 str_replace_all()
【发布时间】:2021-04-30 10:43:48
【问题描述】:

我有一些文本数据,其中包含“[姓]”、“[女名]”和“[男名]”。例如,

c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") 

我希望删除它们进行分析并期望得到

"I am . I am ten years old", "My father is ", "I went to school today"

但是当我运行下面的代码时,它返回的内容被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式,但我不完全确定为什么。

> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')

[1] "I  [fl ]. I  t y old" "My fth i [l ][]"      "I wt to chool tody"  

有人知道怎么解决吗? 提前谢谢你

【问题讨论】:

  • 试试:stringr::str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , stringr::fixed("[surname]"), '')
  • 可能是library(stringi)stri_replace_all_fixed(x,c("[female name]", "[male name]", "[surname]"), '')
  • 或在 base 中:gsub("[surname]", "", c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today"), fixed = TRUE)
  • gsub("\\[.*?\\]", "", c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today"))
  • 那么,有什么意义呢?只替换这三个静态(固定)短语?

标签: r text-processing


【解决方案1】:

使用stringi::str_replace_all:

library(stringi)
data <- c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") 
remove_us <- c("[female name]","[male name]","[surname]")
stri_replace_all_fixed(data, remove_us, "", vectorize_all=FALSE)

结果

[1] "I am . I am ten years old" "My father is  "            "I went to school today"   

R proof

不过,使用gsub 会更简单:

gsub('\\[[^][]*]', '', data)

another R proof

--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  [^][]*                   any character except: ']', '[' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  ]                        ']'

【讨论】:

    猜你喜欢
    • 2016-09-27
    • 1970-01-01
    • 1970-01-01
    • 2010-12-19
    • 2022-11-14
    • 1970-01-01
    • 1970-01-01
    • 2018-02-21
    • 2017-03-09
    相关资源
    最近更新 更多