【发布时间】:2020-07-02 13:38:05
【问题描述】:
我在 sklearn 中训练了我的 LDA 模型来构建主题模型,但不知道如何为每个获得的主题计算关键词 Wordcloud?
这是我的 LDA 模型:
vectorizer = CountVectorizer(analyzer='word',
min_df=3,
max_df=6000,
stop_words='english',
lowercase=False,
token_pattern ='[a-zA-Z0-9]{3,}'
max_features=50000,
)
data_vectorized = vectorizer.fit_transform(data_lemmatized) # data_lemmatized is all my processed document text
best_lda_model = LatentDirichletAllocation(batch_size=128, doc_topic_prior=0.1,
evaluate_every=-1, learning_decay=0.7,
learning_method='online', learning_offset=10.0,
max_doc_update_iter=100, max_iter=10,
mean_change_tol=0.001, n_components=10, n_jobs=None,
perp_tol=0.1, random_state=None, topic_word_prior=0.1,
total_samples=1000000.0, verbose=0)
lda_output = best_lda_model.transform(data_vectorized)
我知道 best_lda_model.components_ 赋予主题词权重... vectorizer.get_feature_names() 给出每个主题的词汇表中的所有单词...
非常感谢!
【问题讨论】:
标签: python scikit-learn lda topic-modeling word-cloud