【发布时间】:2014-04-02 01:33:53
【问题描述】:
这是在配备 GeForce 320M(计算能力 1.2)的 MacBookPro7,1 上。以前,使用 OS X 10.7.8、XCode 4.x 和 CUDA 5.0,CUDA 代码编译并运行良好。
然后,我更新到 OS X 10.9.2、XCode 5.1 和 CUDA 5.5。起初,deviceQuery 失败了。我在别处读到 5.5.28(CUDA 5.5 附带的驱动程序)不支持计算能力 1.x (sm_10),但 5.5.43 支持。将 CUDA 驱动程序更新到最新的 5.5.47(GPU 驱动程序版本 8.24.11 310.90.9b01)后,deviceQuery 确实通过了以下输出。
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce 320M"
CUDA Driver Version / Runtime Version 5.5 / 5.5
CUDA Capability Major/Minor version number: 1.2
Total amount of global memory: 253 MBytes (265027584 bytes)
( 6) Multiprocessors, ( 8) CUDA Cores/MP: 48 CUDA Cores
GPU Clock rate: 950 MHz (0.95 GHz)
Memory Clock rate: 1064 Mhz
Memory Bus Width: 128-bit
Maximum Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536, 32768), 3D=(2048, 2048, 2048)
Maximum Layered 1D Texture Size, (num) layers 1D=(8192), 512 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(8192, 8192), 512 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 512
Max dimension size of a thread block (x,y,z): (512, 512, 64)
Max dimension size of a grid size (x,y,z): (65535, 65535, 1)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: Yes
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 0 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.5, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = GeForce 320M
Result = PASS
此外,我可以在不修改 CUDA 5.5 示例的情况下成功编译,尽管我没有尝试编译所有这些示例。
但是,matrixMul、simpleCUFFT、simpleCUBLAS 等示例在运行时都会立即失败。
$ ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce 320M" with compute capability 1.2
MatrixA(160,160), MatrixB(320,160)
cudaMalloc d_A returned error code 2, line(164)
$ ./simpleCUFFT
[simpleCUFFT] is starting...
GPU Device 0: "GeForce 320M" with compute capability 1.2
CUDA error at simpleCUFFT.cu:105 code=2(cudaErrorMemoryAllocation) "cudaMalloc((void **)&d_signal, mem_size)"
错误代码 2 是 cudaErrorMemoryAllocation,但我怀疑它以某种方式隐藏了失败的 CUDA 初始化。
$ ./simpleCUBLAS
GPU Device 0: "GeForce 320M" with compute capability 1.2
simpleCUBLAS test running..
!!!! CUBLAS initialization error
实际错误代码是 CUBLAS_STATUS_NOT_INITIALIZED 从调用 cublasCreate() 返回。
以前有没有人遇到过这个问题并找到了解决办法?提前致谢。
【问题讨论】:
标签: xcode macos cuda osx-mavericks nvidia