用 agrep 替换拼写错误的值答案

【问题标题】：Replace misspelled values with agrep用 agrep 替换拼写错误的值
【发布时间】：2013-10-18 15:13:43
【问题描述】：

我有一个餐馆数据集，变量“CONAME”包含每个机构的名称。不幸的是，有很多拼写错误，我想更正它们。我已经使用以下代码尝试了 agrep 进行模糊集匹配（我将对所有主要链重复此代码）：

rest2012$CONAME

我收到以下错误消息： $<-.data.frame(*tmp*, "CONAME", value = c(35L, 40L, 48L, 中的错误：替换有3074行，数据有67424

是否有其他方法可以替换拼写错误的名称，或者我只是使用了 agrep 函数错误？

【问题讨论】：

阅读错误信息。您正在尝试用拼写错误 (3074) 的列替换整个列（67424 项）。
@user2868256，我的回答解决了你的问题吗？

标签： r replace subset misspelling agrep

【解决方案1】：

当您将agrep 与value = FALSE 一起使用时，结果是“一个向量给出了产生匹配的元素的索引”。也就是说，匹配在您输入 agrep 的名称向量中的位置。然后，您尝试将数据框（67424 行）中的 整个 name 变量替换为 indices 的较短向量>（其中 3074 个）。不是你想要的。这是一个小例子，也许可以指导您朝着正确的方向前进。您也可以阅读?Extract 和this。 agrep 本身的详细信息（例如max.distance），我留给你。

# create a data frame with some MC DONALD's-ish names, and some other names.
rest2012 <- data.frame(CONAME = c("MC DONALD'S", "MCC DONALD'S", "SPSS Café", "GLM RONALDO'S", "MCMCglmm"))
rest2012

# do some fuzzy matching with 'agrep'
# store the indices in an object named 'idx'
idx <- agrep(pattern = "MC DONALD'S", x = rest2012$CONAME, ignore.case = FALSE, value = FALSE, max.distance = 3)
idx

# just look at the rows in the data frame that matched
# indexing with a numeric vector 
rest2012[idx, ]

# replace the elements that matches 
rest2012[idx, ] <- "MC DONALD'S"
rest2012

【讨论】：