我有一个印地语 wordnet 的数据库和 API。我想从 NLTK python 访问这个 wordnet。有什么方法可以将我们自己的 wordnet 添加到 NLTK 中？ [关闭]答案

【问题标题】：I have a database and API for hindi wordnet. I want to access this wordnet from NLTK python. Is there any way to add our own wordnet into NLTK? [closed]我有一个印地语 wordnet 的数据库和 API。我想从 NLTK python 访问这个 wordnet。有什么方法可以将我们自己的 wordnet 添加到 NLTK 中？ [关闭]
【发布时间】：2014-07-26 01:11:04
【问题描述】：

我有一个印地语 wordnet 的数据库和 API。我想从 NLTK python 访问这个 wordnet，以便在我们的 wordnet 中使用 NLTK Wordnet 函数。有没有办法将我们自己的 wordnet 添加到 NLTK 中？要么是否有任何印地语词义消歧工具（可以在任何语言 Wordnet 上进行一些修改）（从 wordnet 中给出最合适的意义）？

【问题讨论】：

你能给出你拥有的印地语 wordnet 的链接吗？它与普林斯顿 Wordnet 的文件格式完全相同吗？
P.S.不知道为什么票数接近。这似乎是一个很好的问题：如何做到这一点可能很重要，并且对于下一个想要在 Python 中使用印地语 wordnet 的人来说，答案可能非常有用。

标签： python nltk wordnet hindi wsd

【解决方案1】：

如果您查看您的 nltk_data 文件夹，您会发现 wordnet 和其他所有 NLTK 语料库一样只是一堆纯文本文件。因此，必须有一种方法来格式化印地语 wordnet，就像使用 NLTK 一样来使用这些功能。以下是正在读取这些文件的 nltk.corpus.reader.wordnet 对象的摘录：

#: A list of file identifiers for all the fileids used by this
#: corpus reader.
_FILES = ('cntlist.rev', 'lexnames', 'index.sense',
          'index.adj', 'index.adv', 'index.noun', 'index.verb',
          'data.adj', 'data.adv', 'data.noun', 'data.verb',
          'adj.exc', 'adv.exc', 'noun.exc', 'verb.exc', )

def __init__(self, root):
    """
    Construct a new wordnet corpus reader, with the given root
    directory.
    """
    super(WordNetCorpusReader, self).__init__(root, self._FILES,
                                              encoding=self._ENCODING)

我想您并不真的需要生成所有这些文件，但更重要的是必须使用“index.sense”文件进行词义消歧。这不是由 NLTK 生成的，但必须在此之前进行预处理，或者必须以以下格式随印地语 wordnet 一起提供 - http://wordnet.princeton.edu/wordnet/man/senseidx.5WN.html。

完成所有步骤后，我只需转到 ../nltk/corpus/reader/wordnet.py 并创建一个副本，您可以在其中更改根目录和文件名，也许还有一些其他依赖项，但仍然使用功能或更改现有课程中所需的内容（不推荐）。

附：我在谷歌上搜索了一下，得到了http://www.cs.utexas.edu/~rashish/cs365ppt.pdf 的链接，它引用了有关该主题的许多其他来源。

【讨论】：