【问题标题】:How to calculate the occurrence of specific sentence in a text?如何计算文本中特定句子的出现次数?
【发布时间】:2020-04-08 12:50:19
【问题描述】:

如何使用下面的代码来计算二元组在 example_txt 中出现的频率? 现在我想我会返回“订单”是否出现在总变量中。我想计算二元组的百分比。

所以考虑到我们做总的二元组,结果如下: [('order', 'intake'), ('intake', 'is'), ('is', 'strong'), ('strong', 'for'), ('for', 'q4') ] 意思是,我的代码的输出应该是 0.20,因为“订单摄入量”是 1/5。


from nltk import ngrams

example_txt = "order intake is strong for q4"
bi_gram = 'order intake'

#these turns example_txt and bi_gram into bigrams
n_gram_text = ngrams(example_txt.split(), 2)
n_gram = ngrams(bi_gram.split(), 2)


#this is used for extracintg and appending to total and bigram
total =[]
bigram = []
for e in n_gram_text:
    total.append(e)
for i in n_gram:
    bigram.append(i)

#this is supposed to return if bigram exists in total.
for k in bigram:
    for total in k:
        if t in total:
            print('yes')
            print(k)
        else:
            print(t)

编辑:新标题

【问题讨论】:

    标签: python pandas nlp nltk


    【解决方案1】:

    您可以使用集合模块中的计数器:

    from collections import Counter
    bigram = ('order', 'intake')
    counter_total = Counter(total)
    perc_bigram = counter_total[bigram] / sum(counter_total.values())
    perc_bigram
    

    输出:

    0.2
    

    【讨论】:

    • 就是这样,朋友。谢谢!
    猜你喜欢
    • 1970-01-01
    • 2021-12-18
    • 2015-06-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多