【问题标题】:Unexpected type of NER data when trying to train spacy ner pipe to add new named entity尝试训练 spacy ner 管道以添加新命名实体时出现意外类型的 NER 数据
【发布时间】:2021-02-24 23:55:45
【问题描述】:

我正在尝试向 spacy 添加一个新的命名实体,但我没有很好的示例对象用于 ner 训练,并且我收到了一个值错误。 这是我的代码:

import spacy
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training import Example

nlp=spacy.load('en_core_web_lg')

ner=nlp.get_pipe("ner")
TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
           ('we stand with ABC',{'entities':[24,26,'CRORG']}),
           ('we supports ABC',{'entities':[15,17,'CRORG']})]
ner.add_label('CRORG')
# Disable pipeline components that dont need to change
pipe_exceptions = ["ner"]
unaffected_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]

with nlp.disable_pipes(*unaffected_pipes):
    for iteration in range(30):
        random.shuffle(TRAIN_DATA)
        for raw_text,entity_offsets in TRAIN_DATA:
            doc=nlp.make_doc(raw_text)
            nlp.update([Example.from_dict(doc,entity_offsets)])

【问题讨论】:

    标签: nlp spacy named-entity-recognition


    【解决方案1】:

    TRAIN_DATA 中的 'entitites' 应该是一个元组列表。它们必须是二维的,而不仅仅是一维的。

    所以而不是:

    TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[0,2,'CRORG']}),
               ('we stand with ABC',{'entities':[24,26,'CRORG']}),
               ('we supports ABC',{'entities':[15,17,'CRORG']})]
    

    用途:

    TRAIN_DATA=[('ABC is a worldwide organization',{'entities':[(0,2,'CRORG')]}),
               ('we stand with ABC',{'entities':[(24,26,'CRORG')]}),
               ('we supports ABC',{'entities':[(15,17,'CRORG')]})]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2021-08-27
      • 2020-04-28
      • 2020-05-06
      • 1970-01-01
      • 1970-01-01
      • 2018-05-06
      • 2021-07-30
      相关资源
      最近更新 更多