使用 NLP 从纯文本中提取信息

【问题标题】：Information extracting from plain text using NLP使用 NLP 从纯文本中提取信息
【发布时间】：2021-12-09 03:50:05
【问题描述】：

我和我的朋友正在从事一个业余项目，并试图从纯文本中提取数据。不是太复杂的东西，只是试图提取姓名、出生日期或类似的东西。

假设我们有一个这样的文本文件，

“你好，我叫约翰，今年 22 岁。我住在美国，我喜欢玩电子游戏”

我们想填充这样的表格姓名：约翰年龄：22 来自：美国

从上周开始寻找 NLP，我什至不知道从哪里开始。各种帮助表示赞赏。

【问题讨论】：

标签： nlp text-mining

【解决方案1】：

看起来 NER（命名实体识别）就是您要寻找的东西。

这里是 link，它解释了什么是 NER。

关于操作部分，我建议你看看this，但是你可以在网上找到很多免费的指南。

基本上，您将有一个看起来像这样的代码，或多或少：

import spacy # spaCy is a python module to work with NLP
nlp = spacy.load('en_core_web_sm') # loads english NLP model (small)
sentence = "Apple is looking at buying U.K. startup for $1 billion" # here you will type your sentence
doc = nlp(sentence) # process the sentence with the nlp model and retrieve entities
for ent in doc.ents: # for every entity, print text, start index, end index, label (what type of entity it is)
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

【讨论】：