【问题标题】:Consequences of Keras running out of memoryKeras 内存不足的后果
【发布时间】:2019-05-29 00:27:23
【问题描述】:

如果这个问题不在此处,请随时参考另一个 StackExchange 站点。 :-)

我正在使用 Keras,并且我的 GPU(GeForce GTX 970,~4G)上的内存非常有限。因此,我在使用 Keras 时内存不足(OOM),其批量大小设置在一定水平以上。降低批量大小我没有这个问题,但 Keras 输出以下警告:

2019-01-02 09:47:03.173259: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.57GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.211139: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.68GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.268074: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.95GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.685032: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.39GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.732304: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.56GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.850711: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.39GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.879135: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.48GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.963522: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.42GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:03.984897: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.47GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
2019-01-02 09:47:04.058733: W tensorflow/core/common_runtime/bfc_allocator.cc:219] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.08GiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

这些警告对我作为用户意味着什么?这些性能提升是什么?这是否意味着它只是计算速度更快,或者我什至可以在更好的验证损失方面获得更好的结果?

在我的设置中,我使用带有 Tensorflow 后端和 tensorflow-gpu==1.8.0 的 Keras。

【问题讨论】:

  • 你使用的是tensorflow还是tensorflow-gpu
  • 用必要的信息编辑了问题。

标签: tensorflow keras out-of-memory gpu


【解决方案1】:

这意味着训练在速度方面会遇到一些效率损失,因为 GPU 不能用于某些操作。不过,损失的结果不应该受到影响。

为避免此问题,最佳做法是减少批处理大小以有效利用可用 GPU 内存。

【讨论】:

  • 所以应该将 batch_size 减少到不会触发此消息的数量?我个人只在训练开始时才收到消息,然后就不再出现了。
猜你喜欢
  • 2019-04-15
  • 1970-01-01
  • 2018-12-15
  • 2019-06-06
  • 2017-04-25
  • 1970-01-01
  • 1970-01-01
  • 2020-08-05
  • 2020-02-15
相关资源
最近更新 更多