【问题标题】:RuntimeError Pytoch Unable to find a valid cuDNN algorithm to run convolutionRuntimeError Pytoch 无法找到有效的 cuDNN 算法来运行卷积
【发布时间】:2021-09-16 13:11:49
【问题描述】:

我想为我的工作测试一个 github:

https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

所以我做了一个

git clone https://github.com/tufts-ml/GAN-Ensemble-for-Anomaly-Detection

不幸的是,我在执行命令时出错

sh experiments/run_mnist_en_fanogan.sh

(来自 github README)

sh experiments/run_mnist_en_fanogan.sh                                                                                                                     1 ✘ 

/home/svetlana/.local/lib/python3.9/site-packages/torch/cuda/__init__.py:106: UserWarning: 
NVIDIA GeForce RTX 3080 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3080 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
/home/svetlana/.local/lib/python3.9/site-packages/torchvision/datasets/mnist.py:498:      UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:180.)
  return torch.from_numpy(parsed.astype(m[2], copy=False)).view(*s)
Traceback (most recent call last):
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 30, in <module>
    main()
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/train.py", line 24, in main
    model.train()
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 155, in train
    self.gan_training(epoch)
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/f_anogan.py", line 93, in gan_training
    fake_imgs = self.net_Gds[i_G](z)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/Documents/git/GAN-Ensemble-for-Anomaly-Detection/models/networks.py", line 175, in forward
    output = self.main(input)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/svetlana/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward
    return F.conv_transpose2d(
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

我认为我的安装没问题,但现在我有疑问。这是我的安装:

Python 3.9.6 (default, Jun 30 2021, 10:22:16)

 nvcc  --version                                                                                                                                           

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0


import torch
print(torch.__version__)
1.9.0+cu102

我从nvidia网站(https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html)安装了cudnn-11.4,我不知道检查版本的命令,我试过这个:

cat /opt/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

但它什么也没返回

我尝试了在这里找到的解决方案:Failed to get convolution algorithm. This is probably because cuDNN failed to initialize,

没有成功(为了显示 VRAM,我使用了nvtop

【问题讨论】:

    标签: pytorch runtime-error convolution cudnn


    【解决方案1】:

    @Berriel

    没错,我关注的是错误。

    为了解决我的问题,我做到了

    pip uninstall torch torchvision torchaudio
    

    然后

    pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
    

    根据

    https://pytorch.org/get-started/locally/
    

    (此链接来自警告消息)

    【讨论】:

      猜你喜欢
      • 2020-11-29
      • 2017-05-13
      • 1970-01-01
      • 1970-01-01
      • 2020-04-14
      • 1970-01-01
      • 1970-01-01
      • 2023-03-28
      • 1970-01-01
      相关资源
      最近更新 更多