在 NLTK 和 stanford 中查找名词短语的头部，根据查找 NP 头部的规则进行解析答案

【问题标题】：Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP在 NLTK 和 stanford 中查找名词短语的头部，根据查找 NP 头部的规则进行解析
【发布时间】：2015-09-18 14:38:37
【问题描述】：

一般来说，名词短语的头部是位于 NP 最右边的名词，如下图所示，树是父 NP 的头部。所以

根 | 小号 ___|________________________ NP | ___|_____________ | | PP副总裁 | ____|____ ____|___ NP | NP | PRT ___|_______ | | | | DT JJ NN NN IN NNP VBD RP | | | | | | | | 印度老橡树倒下

Out[40]: Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', [' old']), Tree('NN', ['oak']), Tree('NN', ['tree'])]), Tree('PP', [Tree('IN', ['from' ]), Tree('NP', [Tree('NNP', ['India'])])])]), Tree('VP', [Tree('VBD', ['fell']), Tree ('PRT', [Tree('RP', ['down'])])])])

下面的代码based on a java implementation使用了一个简单的规则来寻找NP的头部，但是我需要基于rules：

parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
def traverse(t):
    try:
        t.label()
    except AttributeError:
          return
    else:
        if t.label()=='NP':
            print 'NP:'+str(t.leaves())
            print 'NPhead:'+str(t.leaves()[-1])
            for child in t:
                 traverse(child)

        else:
            for child in t:
                traverse(child)


tree=Tree.fromstring(parsestr)
traverse(tree)

上面的代码给出了输出：

NP:['The', 'old', 'oak', 'tree', 'from', 'India'] NPhead：印度 NP:['The', 'old', 'oak', 'tree'] NPhead:树 NP：['印度'] NPhead:印度

虽然现在它为给定的句子提供了正确的输出，但我需要加入一个条件，即只提取最右边的名词作为 head ，目前它不检查它是否是名词 (NN)

print 'NPhead:'+str(t.leaves()[-1])

所以在上面代码中的 np head 条件中如下所示：

t.leaves().getrightmostnoun()

Michael Collins dissertation (Appendix A) 包含 Penn Treebank 的寻头规则，因此不必只有最右边的名词是 head。因此，上述条件应包含这种情况。

对于答案之一中给出的以下示例：

(NP (NP the person) that give (NP the talk)) 回家了

主语的头名词是人，但讲话的人的 NP 的最后一个离开节点是谈话。

【问题讨论】：

你有什么问题？
@barny 如何找到头部和 NP
请阅读帮助页面stackoverflow.com/help/mcve。在这种情况下，显示你 do 得到的输出：“不工作”对于 StackOverflow 来说是不够的。另外，请尝试在您的代码中添加更多打印语句（例如在您遍历（子）之前的一个，以及另一个在进入遍历时的语句）。发布该执行跟踪的输出——前提是它不会立即向您显示问题。

标签： python algorithm tree nltk stanford-nlp

【解决方案1】：

在 NLTK (http://www.nltk.org/_modules/nltk/tree.html) 中有内置的字符串到 Tree 对象，参见 https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541。

>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i
... 
(NP
  (NP (DT The) (JJ old) (NN oak) (NN tree))
  (PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))


>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()
... 
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']

请注意，最右边的名词并不总是 NP 的中心名词，例如

>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

可以说，Magnificent 仍然可以作为中心名词。另一个例子是当 NP 包含一个关系从句时：

(NP (NP the person) that give (NP the talk)) 回家了

主语的头部名词是person，但是NP the person that gave the talk的最后一个离开节点是talk。

【讨论】：

我终于完成了，如代码所示，但只需要添加一个条件来检查最右边是否是 NN
所以在上面代码中的 np 头部条件中如下所示：t.leaves().getrightmostnoun()
请注意，最右边的名词并不总是 NP 的中心名词！
Michael Collins 论文（附录 A）包括 Penn Treebank 的寻头规则，因此不必只有最右边的名词是 head3
如果您遇到问题，请在 NLTK github 问题上礼貌地寻求帮助实施它。更好的是，尝试实施，使用您的工作代码发出拉取请求并要求进行代码审查，我相信 NLTK 开发人员会帮助您解决这个问题。或者等到其他人编写代码 =)

【解决方案2】：

我一直在寻找一个使用 NLTK 的 python 脚本来完成这项任务，并偶然发现了这篇文章。这是我想出的解决方案。它有点嘈杂和随意，而且绝对不会总是选择正确的答案（例如复合名词）。但是我想发布它，以防其他人有一个最有效的解决方案。

#!/usr/bin/env python

from nltk.tree import Tree

examples = [
    '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
    "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
    '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]

def find_noun_phrases(tree):
    return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]

def find_head_of_np(np):
    noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
    top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
    ## search for a top-level noun
    top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
    if len(top_level_nouns) > 0:
        ## if you find some, pick the rightmost one, just 'cause
        return top_level_nouns[-1][0]
    else:
        ## search for a top-level np
        top_level_nps = [t for t in top_level_trees if t.label()=='NP']
        if len(top_level_nps) > 0:
            ## if you find some, pick the head of the rightmost one, just 'cause
            return find_head_of_np(top_level_nps[-1])
        else:
            ## search for any noun
            nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
            if len(nouns) > 0:
                ## if you find some, pick the rightmost one, just 'cause
                return nouns[-1]
            else:
                ## return the rightmost word, just 'cause
                return np.leaves()[-1]

for example in examples:
    tree = Tree.fromstring(example)
    for np in find_noun_phrases(tree):
        print "noun phrase:",
        print " ".join(np.leaves())
        head = find_head_of_np(np)
        print "head:",
        print head

对于问题和其他答案中讨论的示例，输出如下：

noun phrase: The old oak tree from India
head: tree
noun phrase: The old oak tree
head: tree
noun phrase: India
head: India
noun phrase: the person that gave the talk
head: person
noun phrase: the person
head: person
noun phrase: the talk
head: talk
noun phrase: home
head: home
noun phrase: Carnac the Magnificent
head: Magnificent
noun phrase: a talk
head: talk

【讨论】：