【问题标题】:Python Record Linkage Toolkit - Qgrams ErrorPython 记录链接工具包 - Qgrams 错误
【发布时间】:2021-07-13 22:46:19
【问题描述】:

我一直在关注 Python Record Linkage Toolkit 包中的示例记录链接代码,并且在使用“jarowinkler”字符串匹配方法时运行良好。但是,当使用 method = "qgram" 或 "cosine" 运行时,它会引发一个 numpy 错误。关于可能导致错误的任何想法?

文件 "C:\ProgramData\Anaconda3\lib\site-packages\recordlinkage\compare.py", 第 153 行,在 _compute_vectorized c = c.where((c

AttributeError: 'numpy.ndarray' 对象没有属性 'where'

参考代码:

import recordlinkage
from recordlinkage.datasets import load_febrl1

##### Functions Correctly

dfA = load_febrl1()

# Indexation step
indexer = recordlinkage.Index()
indexer.block(left_on='given_name')
candidate_links = indexer.index(dfA)

compare_cl = recordlinkage.Compare()

compare_cl.string('surname', 'surname', method='jaro', threshold=0.1, label='surname')

features = compare_cl.compute(candidate_links, dfA)
matches = features[features.sum(axis=1) > 0]
print(len(matches))

##### Fails with:
#     AttributeError: 'numpy.ndarray' object has no attribute 'where'

dfA = load_febrl1()

# Indexation step
indexer = recordlinkage.Index()
indexer.block(left_on='given_name')
candidate_links = indexer.index(dfA)

compare_cl = recordlinkage.Compare()

compare_cl.string('surname', 'surname', method='qgram', threshold=0.1, label='surname')

features = compare_cl.compute(candidate_links, dfA)
matches = features[features.sum(axis=1) > 0]
print(len(matches))

【问题讨论】:

    标签: python numpy attributeerror record-linkage


    【解决方案1】:

    对于qgram和/或余弦方法,去除阈值

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-11-07
      • 1970-01-01
      • 1970-01-01
      • 2015-07-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多