【问题标题】:Error when running LDA on Tweets using gensim in Python在 Python 中使用 gensim 在推文上运行 LDA 时出错
【发布时间】:2017-04-21 20:29:17
【问题描述】:

我有以下代码,用于对推文进行 LDA 分析:

import logging, gensim, bz2
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

# load id->word mapping (the dictionary), one of the results of step 2 above
id2word = 'enams4nieuw.dict'
# load corpus iterator
mm = gensim.corpora.MmCorpus('enams4nieuw.mm')

print(mm)

# extract 100 LDA topics, using 1 pass and updating once every 1 chunk (10,000 documents)
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1)

当我尝试运行此脚本时,我收到以下带有错误消息的日志:

MmCorpus(40152 documents, 13061 features, 384671 non-zero entries)
2015-03-31 16:52:50,246 : INFO : loaded corpus index from enams4nieuw.mm.index
2015-03-31 16:52:50,246 : INFO : initializing corpus reader from enams4nieuw.mm
2015-03-31 16:52:50,246 : INFO : accepted corpus with 40152 documents, 13061 features, 384671 non-zero entries
Traceback (most recent call last):
  File "C:/Users/gerbuiker/PycharmProjects/twitter-streaming.py/lda.py", line 15, in <module>
    lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=id2word, num_topics=100, update_every=1, chunksize=10000, passes=1)
  File "C:\Users\gerbuiker\AppData\Roaming\Python\Python27\site-packages\gensim\models\ldamodel.py", line 244, in __init__
self.num_terms = 1 + max(self.id2word.keys())
AttributeError: 'str' object has no attribute 'keys'

Process finished with exit code 1

有人有解决办法吗?

【问题讨论】:

    标签: python lda gensim


    【解决方案1】:

    您将变量 id2word 设置为字符串。

    您似乎有一个文件名 -- 我假设您腌制了您的字典?

    id2word 需要是字典。

    【讨论】:

      【解决方案2】:

      我遇到了同样的错误,似乎 ldamodel.py 试图获取关键字的最大值而不是索引/ID,所以我的解决方案只是交换字典中的列。

      my_dict2 = {y:x for x,y in my_dict.items()}
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2016-06-24
        • 1970-01-01
        • 2016-09-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多