【发布时间】:2021-08-27 03:30:18
【问题描述】:
我正在处理数百行垃圾数据。一个虚拟数据是这样的:
foo_data <- c("Mary Smith is not here", "Wiremu Karen is not a nice person",
"Rawiri Herewini is my name", "Ajibade Smith is my man", NA)
我需要删除所有姓名(英语和非英语名字和姓氏,这样我想要的输出将是:
[1] "is not here" " is not a nice person" " is my name"
[4] "is my man" NA
但是,使用 textclean 包,我只能删除英文名称,留下非英文名称:
library(textclean)
textclean::replace_names(foo_data)
[1] " is not here" "Wiremu is not a nice person" "Rawiri Herewini is my name"
[4] "Ajibade is my man" NA
任何帮助将不胜感激。
【问题讨论】:
-
翻转它:你想提取英文单词。 stackoverflow.com/questions/26715380/…
-
嗨@Roland,我关注了stackoverflow.com/questions/26715380/…,但结果不是我们想要的。
-
重点不是让你复制那个答案。关键是您需要一本字典,而答案中提到了一本。
标签: r string replace text-mining data-cleaning