【发布时间】:2021-10-31 12:43:01
【问题描述】:
尝试使用gensim's fasttext,测试gensim的示例代码,将argument替换为corpus_iterable的小改动
https://radimrehurek.com/gensim/models/fasttext.html
gensim_version == 4.0.1
from gensim.models import FastText
from gensim.test.utils import common_texts # some example sentences
print(common_texts[0])
['human', 'interface', 'computer']
print(len(common_texts))
9
model = FastText(vector_size=4, window=3, min_count=1) # instantiate
model.build_vocab(corpus_iterable=common_texts)
model.train(corpus_iterable=common_texts, total_examples=len(common_texts), epochs=10)
它可以工作,但有没有办法为模型提供get the vocab。例如,在Tensorflow Tokenizer 中有一个word_index,它将返回all the words。这里有类似的吗?
【问题讨论】:
标签: machine-learning nlp gensim word-embedding fasttext