【发布时间】:2020-12-27 14:52:26
【问题描述】:
我有两个数据框,其中x 列可能有拼写错误,y 列始终正确。
我不明白为什么用stringdist 加入多个列会给出这些对:
library(dplyr)
library(fuzzyjoin)
a <- data.frame(x = c("season", "season", "season", "package", "package"), y = c("1","2", "3", "1","6"))
b <- data.frame(x = c("season", "seson", "seson", "package", "pakkage"), y = c("1","2", "3", "2","6"))
c <- a %>%
stringdist_left_join(b, by = c("x", "y"), max_dist = c(1,0))
x.x y.x x.y y.y
1 season 1 season 1
2 season 1 seson 2
3 season 1 seson 3
4 season 2 seson 2
5 season 3 season 1
6 season 3 seson 2
7 season 3 seson 3
8 package 1 package 2
9 package 6 <NA> <NA>
我想得到
x.x y.x x.y y.y
1 season 1 season 1
2 season 2 seson 2
3 season 3 seson 3
4 package 1 <NA> <NA>
5 package 6 pakkage 6
【问题讨论】: