即使成功构建，CUDA 也没有在 OpenCV 中运行答案

【问题标题】：CUDA not running in OpenCV even after successful build即使成功构建，CUDA 也没有在 OpenCV 中运行
【发布时间】：2015-02-24 14:50:35
【问题描述】：

我正在尝试在装有 CUDA 6.5 的 Win 8.1 机器上构建 OpenCV 2.4.10。我还有其他第三方库，它们已成功安装。我运行了一个简单的基于 GPU 的程序，我得到了这个错误No GPU found or the library was compiled without GPU support。我还运行了在安装过程中构建的示例 exe 文件，例如 performance_gpu.exe，我得到了同样的错误。我还检查了 WITH_CUDA 标志。以下是在 CMAKE 构建期间设置的标志（与 CUDA 相关）。

WITH_CUDA：已选中
WITH_CUBLAS：已选中
WITH_CUFFT：选中
CUDA_ARCH_BIN : 1.1 1.2 1.3 2.0 2.1(2.0) 3.0 3.5
CUDA_ARCH_PTX：3.0
CUDA_FAST_MATH：已选中
CUDA_GENERATION：自动
CUDA_HOST_COMPILER : $(VCInstallDir)bin
CUDA_SPERABLE_COMPILATION：未选中
CUDA_TOOLKIT_ROOT_DIR : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5

另一件事是，在我读过的一些帖子中，与 CUDA 一起构建需要花费大量时间。我的构建需要大约 3 小时，其中在编译 .cu 文件期间占用了最大时间。据我所知，在这些文件的编译过程中，我没有遇到任何错误。

在一些帖子中，我看到人们在build 目录中谈论目录名称gpu，但我没有看到任何内容！

我正在使用 Visual Studio 2013。

可能是什么问题？请帮忙！

更新：

我再次尝试构建 opencv，这次在开始构建之前我添加了 CUDA 的 bin、lib 和 include 目录。在E:\opencv\build\bin\Release 构建之后，我运行gpu_perf4au.exe 并得到了这个输出

[----------]
[   INFO   ]    Implementation variant: cuda.
[----------]
[----------]
[ GPU INFO ]    Run test suite on GeForce GTX 860M GPU.
[----------]
Time compensation is 0
OpenCV version: 2.4.10
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2
[----------]
[ GPU INFO ]    Run on OS Windows x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***

Device count: 1

Device 0: "GeForce GTX 860M"
  CUDA Driver Version / Runtime Version          6.50 / 6.50
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  GPU Clock Speed:                               1.02 GHz
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3
D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16
384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and execution:                 Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Concurrent kernel execution:                   Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support enabled:                No
  Device is using TCC driver mode:               No
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
      Default (multiple host threads can use ::cudaSetDevice() with device simul
taneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version  = 6.50, CUDA Runtime Ver
sion = 6.50, NumDevs = 1

我认为一切都很好，但是在运行这个程序后，我在其属性文件中包含了所有 opencv 和 CUDA 目录，

#include <cv.h>
#include <highgui.h>
#include <iostream>
#include <opencv2\opencv.hpp>
#include <opencv2\gpu\gpu.hpp>

using namespace std;
using namespace cv;

char key;

Mat thresholder (Mat input) {
    gpu::GpuMat dst, src;
    src.upload(input);
    gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
    Mat result_host(dst);
    return result_host;
}

int main(int argc, char* argv[]) {

    cvNamedWindow("Camera_Output", 1);
    CvCapture* capture = cvCaptureFromCAM(CV_CAP_ANY);

    while (1){
        IplImage* frame = cvQueryFrame(capture);
        IplImage* gray_frame = cvCreateImage(cvGetSize(frame), IPL_DEPTH_8U, 1);
        cvCvtColor(frame, gray_frame, CV_RGB2GRAY);

        Mat temp(gray_frame);
        Mat thres_temp;
        thres_temp = thresholder(temp);
        //cvShowImage("Camera_Output", frame);   //Show image frames on created window
        imshow("Camera_Output", thres_temp);

        key = cvWaitKey(10);
        if (char(key) == 27){
            break;      //If you hit ESC key loop will break.
        }
    }
    cvReleaseCapture(&capture);
    cvDestroyWindow("Camera_Output");
    return 0;
}

我得到了错误：

OpenCV Error: No GPU support (The library is compiled without CUDA support) in E
mptyFuncTable::mallocPitch, file C:\builds\2_4_PackSlave-win64-vc12-shared\openc
v\modules\dynamicuda\include\opencv2/dynamicuda/dynamicuda.hpp, line 126

【问题讨论】：

标签： opencv cuda cmake

【解决方案1】：

感谢@BeRecursive 为我提供了解决问题的线索。 CMAKE 构建日志包含三个不可用的 opencv 模块，即androidcamera、dynamicuda 和viz。我在dynamicuda 上找不到任何信息，即模块不可用可能导致我在问题中提到的错误。相反，我搜索了viz 模块并检查了它是如何安装的。

浏览了一些博客和论坛后，我发现viz 模块并未包含在 OpenCV 的pre-built 版本中。建议从源版本 2.4.9 构建。我想试一试，我用 VS 2013 和 CMAKE 3.0.1 安装了它，但是有很多构建失败和警告。经过进一步搜索，我发现不建议将 CMAKE 版本 3.0.x 用于构建 OpenCV，因为它们会产生许多警告。

最后我决定切换到 VS 2010 和 CMAKE 2.8.12.2 并且在构建源代码后我没有收到任何错误，幸运的是在 PATH 中添加所有可执行文件、库和 DLL 后，当我运行我提到的程序时上面我没有错误，但运行速度非常慢！所以我运行了这个程序：

#include <cv.h>
#include <highgui.h>
#include <iostream>
#include <opencv2\opencv.hpp>
#include <opencv2\core\core.hpp>
#include <opencv2\gpu\gpu.hpp>
#include <opencv2\highgui\highgui.hpp>

using namespace std;
using namespace cv;

Mat thresholder(Mat input) {
    cout << "Beginning thresholding using GPU" << endl;
    gpu::GpuMat dst, src;
    src.upload(input);
    cout << "upload done ..." << endl;
    gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
    Mat result_host(dst);
    cout << "Thresolding complete!" << endl;
    return result_host;
}

int main(int argc, char** argv) {
    Mat image, gray_image;
    image = imread("desert.jpg", CV_LOAD_IMAGE_COLOR);   // Read the file
    if (!image.data) {
        cout << "Could not open or find the image" << endl;
        return -1;
    }
    cout << "Orignal image loaded ..." << endl;
    cvtColor(image, gray_image, CV_BGR2GRAY);
    cout << "Original image converted to Grayscale" << endl;

    Mat thres_image;
    thres_image = thresholder(gray_image);

    namedWindow("Original Image", WINDOW_AUTOSIZE);// Create a window for display.
    namedWindow("Gray Image", WINDOW_AUTOSIZE);
    namedWindow("GPU Threshed Image", WINDOW_AUTOSIZE);
    imshow("Original Image", image);
    imshow("Gray Image", gray_image);
    imshow("GPU Threshed Image", thres_image);

    waitKey(0);
    return 0;
}

后来我什至在 VS 2013 上测试了构建，它也可以工作。

由于here 提到的原因，基于 GPU 的程序运行缓慢。

所以我想指出三点重要：

仅从源代码构建
使用稍旧的 CMAKE 版本
首选 VS 2010 来构建二进制文件。

注意：

这可能听起来很奇怪，但由于某些链接器错误，我的所有第一次构建都失败了。所以，我不知道这是否可行，但尝试在任何东西和所有其他模块之前构建opencv_gpu，然后再构建 ALL_BUILDS 和 INSTALL 项目。
当您在调试模式下以这种方式构建时，如果您正在构建具有 Python 支持的 opencv，即“python27_d.lib”，则可能会出现错误，否则所有项目都将成功构建。

网络资源：

以下是帮助我解决问题的网络资源：

【讨论】：

【解决方案2】：

这是一个运行时错误，由 OpenCV 抛出。如果您查看来自 previous question 的 CMake 日志，您会看到 Unavailable 软件包之一是 dynamiccuda，这似乎是该错误所抱怨的关于。

但是，我没有太多使用 Windows OpenCV 的经验，所以这可能是一个红鲱鱼。我的直觉告诉你，路径上的所有库都没有正确。你确定你在 PATH 上有 CUDA lib/include/bin 吗？您是否确保路径上有 OpenCV 构建 lib/include 目录。 Windows 有一个非常简单的链接顺序，基本上只包括当前目录、PATH 上的任何内容和主要的 Windows 目录。因此，我会尝试确保 PATH 上的所有内容都正确/您已将所有正确的库复制到文件夹中。

注意：这与编译/链接错误不同，因为它发生在 RUNTIME。所以设置编译器路径对解决运行时链接错误没有帮助。

【讨论】：