具有spacy的实体的基于规则的匹配器答案

【问题标题】：Rule-based matcher of entities with spacy具有spacy的实体的基于规则的匹配器
【发布时间】：2017-04-13 16:36:14
【问题描述】：

我想使用 python 库 spacy 来匹配文本中的标记（添加标签作为语义参考）。然后，我想使用匹配来提取标记之间的关系。我的第一个是使用 spacy 的matcher.add 和matcher.add_pattern。 matcher.add 工作正常，我可以找到令牌，我的代码到现在为止：

import spacy


nlp = spacy.load('en')

def merge_phrases(matcher, doc, i, matches):
    if i != len(matches)-1:
        return None
    spans = [(ent_id, label, doc[start : end]) for ent_id, label, start, end in matches]
    for ent_id, label, span in spans:
        span.merge('NNP' if label else span.root.tag_, span.text, nlp.vocab.strings[label])



matcher = spacy.matcher.Matcher(nlp.vocab)



matcher.add(entity_key='1', label='FINANCE', attrs={}, specs=[[{spacy.attrs.ORTH: 'financial'}, {spacy.attrs.ORTH: 'instrument'}]], on_match=merge_phrases)
matcher.add(entity_key='2', label='BUYER', attrs={}, specs=[[{spacy.attrs.ORTH: 'acquirer'}]], on_match=merge_phrases)
matcher.add(entity_key='3', label='CODE', attrs={}, specs=[[{spacy.attrs.ORTH: 'Code'}]], on_match=merge_phrases)

这工作正常，它输出了非常好的结果：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.')

# Output
['Code|CODE', 'used|', 'to|', 'identify|', 'the|', 'acquirer|BUYER', 'of|', 'the|', 'financial instrument|FINANCE', '.|']

我的问题是，我如何使用matcher.add_patern 来匹配令牌之间的关系，例如

matcher.add_pattern("IS_OF", [{BUYER}, {'of'}, {FINANCE}])

对于输出：

doc = nlp(u'Code used to identify the acquirer of the financial instrument.')

# Output
[acquirer of financial instrument]

我已经尝试了不同的方法来使它起作用，但显然不是，我想我对matcher.add_pattern的理解有问题。

请给我一些正确的方向如何做到这一点空间？
是否可以在此处添加正则表达式以查找模式，如何？
如何添加多个具有相同标签的令牌，或以某种方式创建相同标签的标记列表，例如。 “金融”？

我将不胜感激。

【问题讨论】：

标签： python information-extraction spacy

【解决方案1】：

您的匹配器会识别标记，但要找到它们之间的关系，您必须进行依赖解析。这里是visual example from spacy：

然后您可以遍历树以查找标记之间的关系。 https://spacy.io/docs/usage/dependency-parse#navigating

每个令牌的 dep（枚举）和 dep_（详细名称）属性将为您提供与其子代的关系

【讨论】：

感谢您的回答，它有很大帮助。我想知道训练命名实体模型以在我的源中查找新的相关实体然后查找实体之间的关系是否会更方便。有一些关于使用 NLTK 的文档，但是你将如何使用 spacy 来处理这个问题，我的意思是关系提取部分？
您能否提供一个依赖解析的示例，这是否与 spacy-matcher 兼容，还是我在这里理解错误？
@El_Patrón 答案中提供的链接有示例，是的，它将与 spacy-mathcher 兼容，因为依赖项解析结果是 spacy 令牌本身的属性，它们以 dep 和 dep_ 的形式出现