【问题标题】:LogisticRegression MemoryErrorLogisticRegression MemoryError
【发布时间】:2018-10-29 16:20:55
【问题描述】:

我正在使用逻辑回归在一些文本数据上训练模型。这是我使用的代码:

from fonduer.learning import LogisticRegression
disc_model = LogisticRegression()
%time disc_model.train((train_cands[0], F_train[0]), train_marginals, n_epochs=50, lr=0.001)

当我在 20 个 Docs 上运行代码时没有任何问题,但是当我将 Docs 的数量增加到 40 个时,我得到了这个错误:

[INFO] fonduer.learning.disc_learning - Load defalut parameters for Logistic Regression

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<timed eval> in <module>

~/.venv/lib/python3.6/site-packages/fonduer/learning/disc_learning.py in train(self, X_train, Y_train, n_epochs, lr, batch_size, rebalance, X_dev, Y_dev, print_freq, dev_ckpt, dev_ckpt_delay, save_dir, seed, host_device)
    169 
    170         _X_train, _Y_train = self._preprocess_data(
--> 171             X_train, Y_train, idxs=train_idxs, train=True
    172         )
    173         if X_dev is not None:

~/.venv/lib/python3.6/site-packages/fonduer/learning/disc_models/logistic_regression.py in _preprocess_data(self, X, Y, idxs, train)
     59         C, F = X
     60         if issparse(F):
---> 61             F = F.todense()
     62 
     63         if idxs is None:

~/.venv/lib/python3.6/site-packages/scipy/sparse/base.py in todense(self, order, out)
    844             `numpy.matrix` object that shares the same memory.
    845         """
--> 846         return np.asmatrix(self.toarray(order=order, out=out))
    847 
    848     def toarray(self, order=None, out=None):

~/.venv/lib/python3.6/site-packages/scipy/sparse/compressed.py in toarray(self, order, out)
    945         if out is None and order is None:
    946             order = self._swap('cf')[0]
--> 947         out = self._process_toarray_args(order, out)
    948         if not (out.flags.c_contiguous or out.flags.f_contiguous):
    949             raise ValueError('Output array must be C or F contiguous')

~/.venv/lib/python3.6/site-packages/scipy/sparse/base.py in _process_toarray_args(self, order, out)
   1182             return out
   1183         else:
-> 1184             return np.zeros(self.shape, dtype=self.dtype, order=order)
   1185 
   1186 

MemoryError: 

【问题讨论】:

  • 这意味着你只是内存不足。您能否提供您的机器设置(RAM 等)和文档大小?
  • 你能不能也通过print(F.size)print(F.dtype)来显示F的大小
  • 安装的物理内存 (RAM) 16,0 GB 总物理内存 16,0 GB 可用物理内存 9,15 GB 总虚拟内存 18,3 GB 可用虚拟内存 10,2 GB
  • 大小 = 6347423 和类型 = float64

标签: python memory machine-learning logistic-regression


【解决方案1】:

尝试使用 DASK 包。如果内存较少,它用于大型数据集。您将能够加载比内存本身更大的数据集。

【讨论】:

    【解决方案2】:

    对于LogisticRegression,特征尺寸似乎太大了。我用SparseLogisticRegression替换了LogisticRegression,问题就解决了。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-05-18
      • 2017-12-01
      • 1970-01-01
      • 2016-06-24
      • 2015-10-24
      • 2017-03-31
      • 2016-05-20
      • 2017-04-12
      相关资源
      最近更新 更多