【发布时间】:2021-07-29 06:13:04
【问题描述】:
我用这个碰壁了。如果我在 pandas 数据帧中运行 Rapidfuzz 并且如果我自己运行它,Rapidfuzz 会为字符串分数相似性提供不同的结果?为什么 Adress Similarity 2 和最后一行的结果不同?
from rapidfuzz import process, utils, fuzz
import pandas as pd
import numpy as np
address_a = 'high new technology development zones huainan city anhui province china anhui anhui any city'
address_b = 'industrial park of funan city'
test_anui_data = {'Processed Client Name': ['anhui jinhan clothing co ltd'], 'Processed Aruvio Name': ['anhui jinhan clothing co ltd'], 'Processed Client Address': [address_a], 'Processed Aruvio Address': [address_b], 'Name Similarity': [89.2857142857142], 'Address Similarity': [np.nan]}
# Create DataFrame
test_anui = pd.DataFrame(test_anui_data)
test_anui
test_anui= test_anui[(test_anui['Address Similarity'].isnull()) & (test_anui['Address Similarity']!='')]
test_anui['Address Similarity 2'] = fuzz.token_sort_ratio(str(test_anui['Processed Client Address']), str(test_anui['Processed Aruvio Address']))
print('the address similarity is different? ', fuzz.token_sort_ratio(address_a, address_b))
【问题讨论】:
-
问题:你从哪里得到
'Name Similarity': [89.2857142857142], 'Address Similarity': [np.nan]? -
以创建它们为例
-
您是否也发现了不同的结果?这怎么可能??