【问题标题】:What is the Probability of ‘begining’ given ‘the’?给定“the”,“开始”的概率是多少?
【发布时间】:2014-05-07 23:19:37
【问题描述】:
Using an NLTK Conditional Frequency Distribution and the nltk.bigrams function, train a bigram model on the Genesis:

text = nltk.corpus.genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(text)
cfd = nltk.ConditionalFreqDist(bigrams)
Answer the following questions

What is the Probability of ‘begining’ given ‘the’?
What is the probability of ‘the’?

注意:您作为答案给出的概率必须是可从该语料库计算得出的概率。

您好,有什么可以帮帮我的吗?这是在 nltk 书中。当我得到它时,我得到了 78%,这没有意义。我试图在 Python 中计算它。

【问题讨论】:

  • 零,“开始”不是这样拼写的 :)
  • 我的天才! ..那么呢?我还是 78

标签: python nltk corpus tagged-corpus


【解决方案1】:

probability of 'beginning' intersect 'the' 之间有一些区别

p('beginning','the')

probability of 'beginning' given 'the':

p('beginning'|'the') = p('beginning','the') / p('the')

尝试:

from collections import Counter

import nltk

text = nltk.corpus.genesis.words('english-kjv.txt')
bigrams = nltk.bigrams(text)
cfd_bigrams = Counter(bigrams)
cfd_unigrams = Counter(list(text))

print "p('said','unto') =", cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))

print "p('said'|'unto') =", (cfd_bigrams[u'said', u'unto'] / float(sum(cfd_bigrams.values()))) / cfd_unigrams[u'unto']

print "p('beginning','the') =", cfd_bigrams[u'beginning', u'the']

[出]:

p('said','unto') = 0.00397649844738
p('said'|'unto') = 6.73982787691e-06
p('beginning','the') = 0

【讨论】:

    猜你喜欢
    • 2013-10-13
    • 2021-03-18
    • 2019-02-04
    • 2012-11-01
    • 2018-05-15
    • 2011-06-20
    • 2020-07-28
    • 2012-08-06
    • 1970-01-01
    相关资源
    最近更新 更多