使用 NLTK 查找三元组答案

【问题标题】：find trigram using NLTK使用 NLTK 查找三元组
【发布时间】：2012-06-22 06:41:29
【问题描述】：

我对@987654321@和python不是很熟悉，在一个程序中要完成以下任务：

Tokenize 和小写输入 text1
标记输入text2
在输入 text1 中查找所有 trigrams

谁能帮帮我？

【问题讨论】：

没有表现出自己的努力，“任何人都可以帮助”听起来就像“任何人都可以为我做这件事”。答案可能是否定的。
这听起来也很像家庭作业......

标签： python nlp nltk n-gram

【解决方案1】：

text1 和 text2 是 nltk 示例的一部分吗？然后看看它们，你会发现标记化并不像你想象的那么繁琐:-)

对于小写，请查看任何介绍性的 Python 教程。对于 trigrams，请查看 nltk 书籍。

【讨论】：

【解决方案2】：

如果你不想使用 nltk ngrams：

    """import nltk
    from nltk import word_tokenize

    text1 = "I really like python"
    text2 = " Python is a snake"
    token=nltk.word_tokenize(text1)
    token=nltk.word_tokenize(text2)
    low_text1=nltk.word_tokenize(text1.lower())
    N = 3
    grams = [low_text1[i:i+N] for i in xrange(len(low_text1)-N+1)]"""

【讨论】：

【解决方案3】：

如果你没有这样的例子即将找到所有三元组，您应该首先对其进行标记

>>> import nltk
>>> from nltk import word_tokenize
>>> from nltk.util import ngrams
>>> text1 = "Hi How are you? i am fine and you"
>>> token=nltk.word_tokenize(text1)    #tokenize your text 
>>> tttt=nltk.word_tokenize(text.lower())  #tokenize your text and make it lowercase in onestep
>>> tttt
['hi', 'how', 'are', 'you', '?', 'i', 'am', 'fine', 'and', 'you']

>>> trigrams=ngrams(token,3)          # find all the trigram in text1
>>> trigrams
[('Hi', 'How', 'are'), ('How', 'are', 'you'), ('are', 'you', '?'), ('you', '?', 'i'), ('?', 'i', 'am'), ('i', 'am', 'fine'), ('am', 'fine', 'and'), ('fine', 'and', 'you')]

关于制作你的 text2，你只需要应用标记化步骤

【讨论】：