【发布时间】:2021-02-03 04:08:58
【问题描述】:
我正在尝试在 gensim 中将预训练的手套加载为 word2vec 模型。我已经从here 下载了手套文件。我正在使用以下脚本:
from gensim import models
model = models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary=True)
但出现以下错误
ValueError Traceback (most recent call last)
<ipython-input-38-e0b48b51f433> in <module>()
1 from gensim import models
----> 2 model = models.KeyedVectors.load_word2vec_format('glove.6B.300d.txt', binary=True)
2 frames
/usr/local/lib/python3.6/dist-packages/gensim/models/utils_any2vec.py in <genexpr>(.0)
171 with utils.smart_open(fname) as fin:
172 header = utils.to_unicode(fin.readline(), encoding=encoding)
--> 173 vocab_size, vector_size = (int(x) for x in header.split()) # throws for invalid file format
174 if limit:
175 vocab_size = min(vocab_size, limit)
ValueError: invalid literal for int() with base 10: 'the'
根本问题是什么? gensim 需要特定的格式才能加载吗?
【问题讨论】:
标签: stanford-nlp gensim word2vec word-embedding