【问题标题】:Python34 word2vec.Word2Vec OverFlowErrorPython34 word2vec.Word2Vec OverFlowError
【发布时间】:2015-07-08 07:48:57
【问题描述】:

我正在研究word2vec,但是当我使用word2vec训练文本数据时,Numpy出现OverFlowError。

消息是,

model.vocab[w].sample_int > model.random.randint(2**32)]
Warning (from warnings module):
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 636
    warnings.warn("C extension not loaded for Word2Vec, training will be slow. "
UserWarning: C extension not loaded for Word2Vec, training will be slow. Install a C compiler and reinstall gensim for fast training.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python34\lib\threading.py", line 920, in _bootstrap_inner
    self.run()
  File "C:\Python34\lib\threading.py", line 868, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 675, in worker_loop
    if not worker_one_job(job, init):
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 666, in worker_one_job
    job_words = self._do_train_job(items, alpha, inits)
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 623, in _do_train_job
    tally += train_sentence_sg(self, sentence, alpha, work)
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 112, in train_sentence_sg
    word_vocabs = [model.vocab[w] for w in sentence if w in model.vocab and
  File "C:\Python34\lib\site-packages\gensim\models\word2vec.py", line 113, in <listcomp>
    model.vocab[w].sample_int > model.random.randint(2**32)]
  File "mtrand.pyx", line 935, in mtrand.RandomState.randint (numpy\random\mtrand\mtrand.c:9520)
OverflowError: Python int too large to convert to C long

你能告诉我这些案例吗?

我的机器是 x64,操作系统是 windows 7,但是 python34 是 32 位的。 numpy 和 scipy 也是 32 位的。

【问题讨论】:

    标签: python-3.x windows-7-x64 integer-overflow gensim word2vec


    【解决方案1】:

    我也明白了。看起来 gensim 在 dev 分支中有一个潜在的解决方法。

    https://github.com/piskvorky/gensim/commit/726102df66000f2afcea82d95634b055e6521dc8

    这并不能解决在不同硬件之间导航和安装 int 大小的核心问题,但我认为它应该可以缓解这一特定行的问题。

    必要的改变包括退出

    model.vocab[w].sample_int &gt; model.random.randint(2**32)

    model.vocab[w].sample_int &gt; model.random.rand() * 2**32

    这避免了在 randint 中创建的 64 位 / 32 位 int 问题。

    更新:我手动将该更改合并到我的 gensim 安装中,它可以防止错误。

    【讨论】:

      猜你喜欢
      • 2016-03-15
      • 1970-01-01
      • 1970-01-01
      • 2021-04-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-03
      相关资源
      最近更新 更多