【发布时间】:2021-04-30 10:43:48
【问题描述】:
我有一些文本数据,其中包含“[姓]”、“[女名]”和“[男名]”。例如,
c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")
我希望删除它们进行分析并期望得到
"I am . I am ten years old", "My father is ", "I went to school today"
但是当我运行下面的代码时,它返回的内容被破坏了。我认为 str_replace_all 可能会将 [ ] 的模式识别为正则表达式,但我不完全确定为什么。
> str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , "[surname]", '')
[1] "I [fl ]. I t y old" "My fth i [l ][]" "I wt to chool tody"
有人知道怎么解决吗? 提前谢谢你
【问题讨论】:
-
试试:
stringr::str_replace_all(c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today") , stringr::fixed("[surname]"), '') -
可能是
library(stringi)和stri_replace_all_fixed(x,c("[female name]", "[male name]", "[surname]"), '')? -
或在 base 中:
gsub("[surname]", "", c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today"), fixed = TRUE) -
或
gsub("\\[.*?\\]", "", c("I am [female name]. I am ten years old", "My father is [male name][surname]", "I went to school today")) -
那么,有什么意义呢?只替换这三个静态(固定)短语?
标签: r text-processing