【发布时间】:2017-08-01 09:26:28
【问题描述】:
我的数据如下所示
# dummy data
ID = c(1,2,3,4,5,6,7,8,9,10,11,12)
addrs = c("3 xx road sg" , "4 yy road sg" , "5 apt 04-3 sg" , "Bung 2 , kl road sg","4 yy road sg" , "3 xx road sg" ,"Bung 2 , kl road sg" ,"5 apt 04-3 sg","3 xx road sg","Bung 2 , sg kl road","3xx Road sg","4 yy sg")
data.1=data.table(ID,addrs)
数据看起来像
ID addrs
1: 1 3 xx road sg
2: 2 4 yy road sg
3: 3 5 apt 04-3 sg
4: 4 Bung 2 , kl road sg
5: 5 4 yy road sg
6: 6 3 xx road sg
7: 7 Bung 2 , kl road sg
8: 8 5 apt 04-3 sg
9: 9 3 xx road sg
我想获得匹配的组合(基于 addrs)所需的输出是(“3 xx road sg”的唯一示例) - 如果 A 和 B 的 Addr 匹配,则表应该有 A-B - Match 和 B-A-Match
ID.1 ID.2 Match.1 Match.2 Accuracy
1 6 3 xx road sg 3 xx road sg 100%
1 9 3 xx road sg 3 xx road sg 100%
6 9 3 xx road sg 3 xx road sg 100%
9 6 3 xx road sg 3 xx road sg 100%
9 1 3 xx road sg 3 xx road sg 100%
6 1 3 xx road sg 3 xx road sg 100%
显示文本可能因空格、字符顺序或字符不同而不同的输出
ID.1 ID.2 Match.1 Match.2 Accuracy
1 11 3 xx road sg 3xx Road sg 100 %
2 12 4 yy road sg 4 yy sg 70 %
4 10 Bung 2 , kl road sg Bung 2 , sg kl road 100 %
当数据可能相似但写法不同时,如何处理文本匹配?
【问题讨论】:
标签: text reshape lexicographic