【问题标题】:How to remove an entity from a sentence with spaCy?如何使用 spaCy 从句子中删除实体?
【发布时间】:2021-02-23 00:29:44
【问题描述】:

如何使用 spaCy 从句子中删除实体? 我想随机删除 ORP、GPE、Money、Ordinal 或 Percent 实体。 例如,

唐纳德·约翰·特朗普[人](生于 1946 年 6 月 14 日)[日期] 是美国第 45 任[序数] 和现任总统 [GPE]。在进入政界之前,他是一名商人和电视名人。

现在如何从这句话中删除某个实体? 在此示例中,该函数选择删除第 45 个序数实体。

>>> sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
>>> remove(sentence)
45th

【问题讨论】:

    标签: python nlp nltk spacy


    【解决方案1】:

    请尝试SpacyNER 和np.random.choice

    import spacy
    nlp = spacy.load("en_core_web_md")
    
    sentence = 'Donald John Trump (born June 14, 1946) is the 45th and current president of the United States. Before entering politics, he was a businessman and television personality.'
    doc = nlp(sentence)
    
    ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
    remove = lambda x: str(np.random.choice(x))
    # expected output
    remove(ents)
    '45th'
    

    您是否希望从句子文本中删除随机实体:

    def remove_from_sentence(sentence):
        doc = nlp(sentence)
        with doc.retokenize() as retokenizer:
            for e in doc.ents:
                retokenizer.merge(doc[e.start:e.end])
        tok_pairs = [(tok.text, tok.whitespace_) for tok in doc]
        ents = [e.text for e in doc.ents if e.label_ in ("NORP", "GPE", "MONEY", "ORDINAL","PERCENT")]
        ent_to_remove = remove(ents)
        print(ent_to_remove)
        tok_pairs_out = [pair for pair in tok_pairs if pair[0] != ent_to_remove]
        return "".join(np.array(tok_pairs_out).ravel())
    
    remove_from_sentence(sentence)
    
    the United States
    'Donald John Trump (born June 14, 1946) is the 45th and current president of . Before entering politics, he was a businessman and television personality.'
    

    如果有不清楚的地方请追问。

    【讨论】:

      猜你喜欢
      • 2020-04-06
      • 2022-10-05
      • 1970-01-01
      • 1970-01-01
      • 2021-12-20
      • 1970-01-01
      • 1970-01-01
      • 2017-05-03
      • 1970-01-01
      相关资源
      最近更新 更多