【问题标题】:pyLDAvis: Validation error on trying to visualize topics with BTMpyLDAvis:尝试使用 BTM 可视化主题时出现验证错误
【发布时间】:2019-04-16 16:31:23
【问题描述】:

我尝试使用BTM 生成主题。 在尝试可视化主题时,我收到验证错误。我可以在模型训练后打印主题,但使用 pyLDAvis 失败

def btm_model():
    num_topics = 10
    texts = open('./textfiles/Ori-Apr2, 2019.txt').read().splitlines()
    # vectorize texts
    vec = CountVectorizer(stop_words='english')
    X = vec.fit_transform(texts).toarray()
    # get vocabulary
    vocab = np.array(vec.get_feature_names())
    # get biterms
    biterms = vec_to_biterms(X)
    # create btm
    btm = oBTM(num_topics = num_topics, V = vocab)
    print("\n\n Train Online BTM ..")
    for i in range(0, 1): 
        biterms_chunk = biterms[i:i + 100]
        btm.fit(biterms_chunk, iterations=10)

    print("\n\n Topic coherence ..")
    res, C_z_sum = topic_summuary(btm.phi_wz.T, X, vocab, 10)

    topics = btm.transform(biterms)
    print("\n\n Visualize Topics ..")
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
    pyLDAvis.save_html(vis, './textfiles/online_btm.html')

在 pyLDAvis 上运行后尝试出现以下错误

Traceback (most recent call last):
  File "main_mining.py", line 293, in <module>
    btm_model(num_topics)
  File "main_mining.py", line 187, in btm_model
    vis = pyLDAvis.prepare(btm.phi_wz.T, topics, np.count_nonzero(X, axis=1), vocab, np.sum(X, axis=0))
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 375, in prepare
    _input_validate(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency)
  File "C:\Python Install Location\lib\site-packages\pyLDAvis\_prepare.py", line 65, in _input_validate
    raise ValidationError('\n' + '\n'.join([' * ' + s for s in res]))
pyLDAvis._prepare.ValidationError:
 * Not all rows (distributions) in doc_topic_dists sum to 1.

【问题讨论】:

  • 你修好了吗?
  • 对不起,我还没有找到解决办法。

标签: python


【解决方案1】:

就我而言,发生这种情况是因为我的一些句子只有几个标记。我删除了所有少于三个标记的句子,它就像魅力一样。

【讨论】:

  • 对我来说也是如此。我只是添加了corpus = [doc for doc in corpus if len(doc)&gt;1],就成功了。
猜你喜欢
  • 2018-06-08
  • 1970-01-01
  • 2021-05-24
  • 1970-01-01
  • 2020-04-06
  • 2013-05-10
  • 1970-01-01
  • 2021-06-17
  • 1970-01-01
相关资源
最近更新 更多