【问题标题】:Spacy dependencymatcher pattern not returning matchesSpacy依赖匹配器模式不返回匹配项
【发布时间】:2021-05-22 15:49:16
【问题描述】:

我正在尝试使用 spacy DependencyMatcher 创建、添加模式并从模式中获取结果。

我为句子创建了一个模式:“从星期一到星期五”

完整模式:

pattern = [
    {
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {'DEP': 'ROOT', 'POS': 'ADP', 'TAG': 'IN'}
    },
    {
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
    },
    {
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {'DEP': 'prep', 'POS': 'ADP', 'TAG': 'IN'},
    },
       {
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'DEP': 'pobj', 'POS': 'PROPN', 'TAG': 'NNP'},
    },
    
]

更简单的模式是:

pattern = [
    {
        "RIGHT_ID": "node0",
        "RIGHT_ATTRS": {"POS": "ADP"}
    },
    {
        "LEFT_ID": "node0",
        "REL_OP": ">",
        "RIGHT_ID": "node1",
        "RIGHT_ATTRS": {"POS": "PROPN"},
    },
    {
        "LEFT_ID": "node1",
        "REL_OP": "$--",
        "RIGHT_ID": "node2",
        "RIGHT_ATTRS": {"POS": "ADP"},
    },
       {
        "LEFT_ID": "node2",
        "REL_OP": ">",
        "RIGHT_ID": "node3",
        "RIGHT_ATTRS":{'POS': 'PROPN'},
    },
    
]

我的问题是,为什么这个模式没有给出任何匹配,而不是完整或更简单的模式?

import spacy
from spacy.matcher import DependencyMatcher


nlp = spacy.load("en_core_web_sm")
matcher = DependencyMatcher(nlp.vocab)


text="From monday to friday"
doc = nlp(text)
matcher.add("pattern1", [pattern])

matches = matcher(doc)

# Each token_id corresponds to one pattern dict
match_id, token_ids = matches[0]

spacy 版本:

spaCy v3.0.6

命名空间版本

en_core_web_sm >=3.0.0,

【问题讨论】:

    标签: python nlp spacy matcher named-entity-recognition


    【解决方案1】:

    node2REL_OP 倒退了。应该是$++


    为了给出完整的解释,这段代码对我有用。

    import spacy
    
    from spacy.matcher import DependencyMatcher
    
    nlp = spacy.load("en_core_web_sm")
    matcher = DependencyMatcher(nlp.vocab)
    
    text="From Monday to Friday"
    doc = nlp(text)
    
    pattern = [
        {
            "RIGHT_ID": "node0",
            "RIGHT_ATTRS": {'POS': 'ADP', 'TAG': 'IN'}
        },
        {
            "LEFT_ID": "node0",
            "REL_OP": ">",
            "RIGHT_ID": "node1",
            "RIGHT_ATTRS": {'POS': 'PROPN'},
        },
        {
            "LEFT_ID": "node1",
            "REL_OP": "$++",
            "RIGHT_ID": "node2",
            "RIGHT_ATTRS": {'POS': 'ADP'},
        },
           {
            "LEFT_ID": "node2",
            "REL_OP": ">",
            "RIGHT_ID": "node3",
            "RIGHT_ATTRS":{'POS': 'PROPN'},
        },
        
    ]
    
    matcher.add("pattern1", [pattern])
    
    matches = matcher(doc)
    print(matches)
    
    print("-----")
    # this part is just for reference
    for word in doc:
        print(word.pos_, word.tag_, word.dep_, word, sep="\t")
    

    关于这个的几点:

    • 您的第二种模式更好,您不需要为英语指定标签和位置(标签确定位置)
    • 在 v3 小模型中,“monday”和“friday”不是专有名词,除非大写(看起来您的 displaCy 输出来自使用 v2 的公共演示)

    【讨论】:

      猜你喜欢
      • 2022-08-04
      • 1970-01-01
      • 1970-01-01
      • 2021-10-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多