【发布时间】:2018-10-12 02:27:08
【问题描述】:
1.作者标签:
\author{{\small Tanya Araujo$^{a,b}$ and Elsa Fontainha$^{a}$} \and {\small $^{a}$ISEG
(Lisbon School of Economics \& Management) Universidade de Lisboa, } \and
{\small Rua do Quelhas, 6 1200-781 Lisboa Portugal} \and {\small $^{b}$Research
Unit on Complexity and Economics (UECE)} \and {\small Rua Miguel Lupi, 20
1249-078 Lisboa Portugal}}
\author{{\bf R. Vilela Mendes} \and {\small Grupo de Fisica Matematica, Av.
Gama Pinto 2,} \and {\small \ 1699 Lisboa Codex, Portugal
(vilela@cii.fc.ul.pt)} \and {\bf Tanya Araujo and Francisco Lou\cc\a%
} \and {\small Departamento de Economia, ISEG,} \and {\small R. Miguel Lupi
20, 1200 Lisboa, Portugal} \and {\small (tanya@iseg.utl.pt,
flouc@iseg.utl.pt)}}
2。删除了特殊字符、其他标签、电子邮件和数字:
Tanya Araujo 和 Elsa Fontainha ISEG 里斯本经济与管理学院 里斯本大学, Rua do Quelhas, - Lisboa Portugal Research 复杂性和经济学单元 UECE Rua Miguel Lupi, - 葡萄牙里斯本
R。 Vilela Mendes Grupo de Fisica Matematica, Av. Gama Pinto , Lisboa Codex, 葡萄牙 Tanya Araujo 和弗朗西斯科·卢 Departamento de Economia, ISEG, R. Miguel Lupi ,里斯本,葡萄牙,
3.期望的输出: 仅提取名称并删除大学名称或任何位置名称。尝试使用来自 NLTK 的 NER,但它将 Universidade 和 Lisboa 识别为 PERSON 等。
(PERSON Tanya/NNP)
(PERSON Araujo/NNP)
and/CC
(PERSON Elsa/NNP Fontainha/NNP)
ISEG/NNP
(/(
(ORGANIZATION Lisbon/NNP School/NNP)
of/IN
(ORGANIZATION Economics/NNP)
&/CC
Management/NNP
)/)
(PERSON Universidade/NNP)
de/FW
(PERSON Lisboa/NNP)
,/,
(PERSON Rua/NNP)
do/VBP
(PERSON Quelhas/NNP)
,/,
-/:
(PERSON Lisboa/NNP Portugal/NNP Research/NNP Unit/NNP)
on/IN
(ORGANIZATION Complexity/NNP)
and/CC
(GPE Economics/NNP)
(/(
(ORGANIZATION UECE/NNP)
)/)
(PERSON Rua/NNP Miguel/NNP Lupi/NNP)
,/,
-/:
(PERSON Lisboa/NNP Portugal/NNP Alessandro/NNP Spelta/NNP)
corresponding/VBG
author/NN
:/:
and/CC
(PERSON Tanya/NNP Araujo/NNP))
是否可以使用来自 NLTK 的 NER 来解决这个问题,或者我们应该尝试使用 spaCy 等任何其他库吗?
【问题讨论】:
标签: python latex nltk author named-entity-recognition