【问题标题】:Match list of strings with a block of text将字符串列表与文本块匹配
【发布时间】:2020-09-15 08:51:57
【问题描述】:

这里是初学者:

我有一段文字:

例如:'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'

还有一个单词列表:['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']

我的最终目标是从文本块中找到字符串匹配/模糊匹配的单词列表。

我尝试了什么:difflib.get_close_matches

需要输出:'angiotensin enzyme serum''angiotensin enzyme a1'

输出顺序不是问题。

对于其他文本块,列表中的其他一些字符串将匹配。块不是常量。

有没有办法做到这一点?

【问题讨论】:

    标签: python-3.x fuzzy-search


    【解决方案1】:

    使用fuzzywuzzy(来自 PyPi):

    from fuzzywuzzy import fuzz
    
    text = 'hey this is a block of text, for an example, wow looks cool blah blah blah angiotensin enzyme looks cool okay.But what about angiotensin enzym well I dont know.'
    
    words = ['angiotensin enzyme serum', 'some diff enzyme', 'angiotensin enzyme a1']
    
    matches = [w for w in words if fuzz.partial_ratio(text, w) > 70.]
    

    显然,您需要调整阈值以适应,但在此示例中这些值被很好地分开:

    >>> print(matches)
    ['angiotensin enzyme serum', 'angiotensin enzyme a1']
    
    >>> for w in words:
    ...     print(w, fuzz.partial_ratio(text, w))
    ... 
    angiotensin enzyme serum 83
    some diff enzyme 56
    angiotensin enzyme a1 90
    

    【讨论】:

      猜你喜欢
      • 2014-01-09
      • 1970-01-01
      • 2013-06-18
      • 1970-01-01
      • 2021-11-30
      • 1970-01-01
      • 2021-08-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多