【发布时间】:2019-04-22 05:19:09
【问题描述】:
我使用 Gensim LDAMallet 进行主题建模,但我们可以通过什么方式预测示例段落并使用预训练模型获取他们的主题模型。
# Build the bigram and trigram models
bigram = gensim.models.Phrases(t_preprocess(dataset.data), min_count=5, threshold=100)
bigram_mod = gensim.models.phrases.Phraser(bigram)
def make_bigrams(texts):
return [bigram_mod[doc] for doc in texts]
data_words_bigrams = make_bigrams(t_preprocess(dataset.data))
# Create Dictionary
id2word = corpora.Dictionary(data_words_bigrams)
# Create Corpus
texts = data_words_bigrams
# Term Document Frequency
corpus = [id2word.doc2bow(text) for text in texts]
mallet_path='/home/riteshjain/anaconda3/mallet/mallet2.0.8/bin/mallet'
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path,corpus=corpus, num_topics=12, id2word=id2word, random_seed = 0)
coherence_model_ldamallet = CoherenceModel(model=ldamallet, texts=texts, dictionary=id2word, coherence='c_v')
a = "When Honda builds a hybrid, you've got to be sure it’s a marvel. And an Accord Hybrid is when technology surpasses the known and takes a leap of faith into tomorrow. This is the next generation Accord, the ninth generation to be precise."
如何使用此文本 (a) 从预训练模型中获取其主题。请帮忙。
【问题讨论】:
标签: python jupyter-notebook gensim topic-modeling mallet