【发布时间】:2019-11-01 05:33:00
【问题描述】:
我希望标准化一组手动输入的字符串,以便:
index fruit
1 Apple Pie
2 Apple Pie.
3 Apple. Pie
4 Apple Pie
5 Pear
应该看起来像:
index fruit
1 Apple Pie
2 Apple Pie
3 Apple Pie
4 Apple Pie
5 Pear
对于我的用例,按phonetic 声音对它们进行分组很好,但我错过了如何用最常见的字符串替换最不常见的字符串。
library(tidyverse)
library(stringdist)
index <- seq(1,5,1)
fruit <- c("Apple Pie", "Apple Pie.", "Apple. Pie", "Apple Pie", "Pear")
df <- data.frame(index, fruit) %>%
mutate(grouping = phonetic(fruit)) %>%
add_count(fruit) %>%
# Missing Code
select(index, fruit)
【问题讨论】: