【发布时间】:2015-05-26 03:10:10
【问题描述】:
我正在尝试复制此人制作的线性规划求解器
http://www.idi.ntnu.no/~elster/master-studs/spampinato/spampinato-linear-prog-gpu-report.pdf.
我使用的设备是 Quadro FX 1800M,计算能力为 1.2。
我的问题是,当我在每个块中启动超过 22 个线程时,大多数时候我会得到不准确的结果(有时全为零),但在特殊情况下,即使每个块启动 512 个线程,我也会得到准确的结果。
这是我进行的一些测试运行。 (Sequential 表示基于 CPU 的版本)用于比较
Iteration No 1 : of Sequential Version
Optimum Found 24.915583
Elapsed time: 0.001049725
Iteration No 1: of Parallel Version
BS-(Number of Threads) = : 20
Optimum found: 24.915583
Iteration No 2: of Parallel Version
BS-(Number of Threads) = : 256
Optimum found: 24.915607
Iteration No 3: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 24.917068
Iteration No 4: of Parallel Version
BS-(Number of Threads) = : 2
Optimum found: 24.915583
Iteration No 5: of Parallel Version
BS-(Number of Threads) = : 456
Optimum found: -30693000299230806209574138333792043008.000000
Iteration No 6: of Parallel Version
BS-(Number of Threads) = : 456
Problem unsolvable: either qth==0 or loop too long.
Iteration No 7: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 25.010513
Iteration No 8: of Parallel Version
BS-(Number of Threads) = : 256
Problem unsolvable: either qth==0 or loop too long.
Iteration No 9: of Parallel Version
BS-(Number of Threads) = : 256
Optimum found: 0.000000
Iteration No 10: of Parallel Version
BS-(Number of Threads) = : 512
Optimum found: 0.000000
有人能指出我可能做错了什么吗,我知道我没有发布代码,但我假设代码是正确的,因为我从研究论文中复制了它,问题就在我身上。
我还应该指出,我在编译 cuda 代码时遇到以下错误
ptxas /tmp/tmpxft_000017e7_00000000-10_culiblp.ptx,第 263 行;警告:不支持双精度。降级为浮动
这可能是导致结果的原因吗?
【问题讨论】:
-
请检查所有 API 调用是否有错误:stackoverflow.com/tags/cuda/info