Tensorflow 因 CUBLAS_STATUS_ALLOC_FAILED 崩溃答案

【问题标题】：Tensorflow crashes with CUBLAS_STATUS_ALLOC_FAILEDTensorflow 因 CUBLAS_STATUS_ALLOC_FAILED 崩溃
【发布时间】：2017-04-28 07:34:42
【问题描述】：

我正在使用简单的 MINST 神经网络程序在 Windows 10 上运行 tensorflow-gpu。当它尝试运行时，遇到CUBLAS_STATUS_ALLOC_FAILED 错误。谷歌搜索没有发现任何东西。

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.253
pciBusID 0000:0f:00.0
Total memory: 4.00GiB
Free memory: 3.31GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0:   Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970, pci bus id: 0000:0f:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
W c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _do_call
    return fn(*args)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1003, in _run_fn
    status, run_metadata)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Users\Anonymous\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 256), m=100, n=256, k=784
         [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_7, Variable/read)]]
         [[Node: Mean/_15 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_35_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

【问题讨论】：

标签： tensorflow windows-10 mnist cublas

【解决方案1】：

对于 TensorFlow 2.2，当遇到 CUBLAS_STATUS_ALLOC_FAILED 问题时，其他答案均无效。在https://www.tensorflow.org/guide/gpu上找到了解决方案：

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

我在进行任何进一步计算之前运行了此代码，发现之前产生 CUBLAS 错误的相同代码现在在同一会话中工作。上面的示例代码是一个特定示例，它设置了跨多个物理 GPU 的内存增长，但它也解决了内存扩展问题。

【讨论】：

在 2020 年，这是我发现的唯一可行的解决方案。
这适用于我的几个应用程序。 Cuda 11.1，cudnn 8.0.5，GPU 计算 8.6 3080。
谢谢，一个连接的问题，gpu是否需要在每次执行时设置'set_memory_growth'标志？
我每次启动使用 TensorFlow GPU 的脚本时都会使用此代码。

【解决方案2】：

聚会有点晚了，但这解决了我使用 tensorflow 2.4.0 和 gtx 980ti 的问题。在限制内存之前，我收到了如下错误：

CUBLAS_STATUS_ALLOC_FAILED

我的解决方案是这段代码：

import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])

我在这里找到了解决方案：https://www.tensorflow.org/guide/gpu

【讨论】：

【解决方案3】：

会话配置的“allow_growth”属性的位置现在似乎不同了。这里解释一下：https://www.tensorflow.org/tutorials/using_gpu

所以目前你必须这样设置：

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

【讨论】：

session = tf.Session(config=config, ...) ^ SyntaxError: positional argument following keyword argument solution doesn't work.
不适用于 tf 2.1：tf.__version__ '2.1.0'，模块 'tensorflow' 没有属性 'ConfigProto'
@yee 与 tensorflow-gpu 2.2.0 相同

【解决方案4】：

在我的例子中，陈旧的 python 进程正在消耗内存。我通过任务管理器杀了它，一切都恢复正常了。

【讨论】：

【解决方案5】：

THIS CODE WORK FOR ME

张量流>=2.0

import tensorflow as tf
config = tf.compat.v1.ConfigProto(gpu_options = 
                         tf.compat.v1.GPUOptions(per_process_gpu_memory_fraction=0.8)
# device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.compat.v1.Session(config=config)
tf.compat.v1.keras.backend.set_session(session)

【讨论】：

使用 tf 2.1.0、windows 10、16GB RAM、RTX 2070 Max Q 8GB，但我将值更改为 0.5
也在这台机器上工作：userbenchmark.com/UserRun/30694804 但它给了我The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead. 我认为，这应该包含在答案中，但作者应该决定。
于 2020 年 11 月在 Windows x64 上使用 Python 3.75 在 RTX 2080 Ti 上使用 Cuda 10.1 和 TensorFlow 2.3 工作。
用分数 0.8 为我工作。在 RTX 2060 上使用 TensorFlow 2.4.0、Windows 10。
为我工作，但最终没有使用太多的 GPU 内存 (

【解决方案6】：

对于 keras：

from keras.backend.tensorflow_backend import set_session
import tensorflow as tf

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

【讨论】：

不适用于（经过测试的）tensorflow 2.1 和 2.2 并给出此错误：AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

【解决方案7】：

这些修复都不适用于我，因为 tensorflow 库的结构似乎发生了变化。对于Tensorflow 2.0，唯一对我有用的修复是在此页面上的Limiting GPU memory growth https://www.tensorflow.org/guide/gpu

为了完整性和面向未来，这是文档中的解决方案 - 我想更改 memory_limit 可能对某些人来说是必要的 - 1 GB 对我的情况来说很好。

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

【讨论】：

非常感谢。这是唯一有效的解决方案。复制这段代码时不要忘记添加“import tensorflow as tf”。

【解决方案8】：

Tensorflow 2.0 alpha

允许 GPU 内存增长可能会解决此问题。对于 Tensorflow 2.0 alpha / nightly，您可以尝试两种方法来存档。

1.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_growth()

2.)

import tensorflow as tf
tf.config.gpu.set_per_process_memory_fraction(0.4) # adjust this to the % of VRAM you 
                                                   # want to give to tensorflow.

我建议你两个都试试，看看是否有帮助。来源：https://www.tensorflow.org/alpha/guide/using_gpu

【讨论】：

我认为你的意思是 tf.config.gpu.set_per_process_memory_growth()
in tf.config.gpu.set_per_process_memory_growth() AttributeError: module 'tensorflow_core._api.v2.config' has no attribute 'gpu'
@seilgu 现在 2.0 已经过 alpha，它是 tf.config.experimental.set_virtual_device_configuration(tf.config.experimental.list_physical_devices('GPU')[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)]): tensorflow.org/guide/gpu#limiting_gpu_memory_growth

【解决方案9】：

我发现这个解决方案有效

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session

config = tf.ConfigProto(
    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
    # device_count = {'GPU': 1}
)
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
set_session(session)

【讨论】：

不适用于（已测试）tensorflow 2.1和2.2 并给出此错误：AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

【解决方案10】：

在 Windows 上，目前 tensorflow 并未像文档中所说的那样分配所有可用内存，相反，您可以通过允许动态内存增长来解决此错误，如下所示：

tf.Session(config=tf.ConfigProto(allow_growth=True))

【讨论】：

ConfiProto 似乎缺少此参数，从而产生错误ValueError: Protocol message ConfigProto has no "allow_growth" field
可能只适用于 TF1，版本 2.1 和 2.2 给了我同样的错误，但 Jai Mahesh (stackoverflow.com/users/11280106/jai-mahesh) 的回答对我有用。答案链接：stackoverflow.com/a/59558128/4575793