一旦数据进入 Cython 模块，精度就会丢失/更改答案

【问题标题】：Precision is lost / changed once data goes to Cython module一旦数据进入 Cython 模块，精度就会丢失/更改
【发布时间】：2016-06-04 01:58:21
【问题描述】：

我将一些使用 NumPy 的代码移植到 Cython 以获得一些性能提升。我取得了相当大的提升，但我遇到了一个问题。

Cython 得到的结果与 Python 得到的结果不同。我不知道为什么会这样，所以我决定看看什么被推送到 Cython 模块。

在到达 Cython 之前，数据如下所示：

azimuth = 0.000349065850399 

rawDistance = [ 2.682  7.234  2.8    7.2    2.912  7.19   3.048  7.174  3.182  7.162
  3.33   7.164  3.506  7.158  3.706  7.154  3.942  7.158  4.192  7.158
  4.476  7.186  4.826  7.19   5.218  7.204  5.704  7.224  6.256  7.248
  6.97   7.284] 

intensity = [19 34 25 28 26 48 21 56 21 60 31 49 24 37 26 37 34 37 23 84 15 59 23 45 
             18  47 20 55 18 36 15 39]

一旦它进入 Cython，同样的数据看起来像：

azimuth = 0.000349065850399 

rawDistance = [2.686, 7.23, 2.7960000000000003, 7.204, 2.91, 7.188, 3.044, 7.174, 3.19, 
               7.16, 3.3280000000000003, 7.16, 3.5, 7.154, 3.704, 7.144, 3.936, 7.158, 
               4.196, 7.156000000000001, 4.478, 7.19, 4.8260000000000005, 7.192, 5.22, 
               7.204, 5.708, 7.22, 6.256, 7.252, 6.97, 7.282] 

intensity = [19, 34, 27, 28, 26, 48, 22, 52, 21, 60, 31, 49, 24, 37, 28, 34, 32, 37, 
             23, 84, 15, 59, 23, 45, 18, 47, 20, 58, 18, 36, 15, 36]

这就解释了为什么结果与纯 Python 方法计算的结果不完全相同。

这是信息传输到的 Cython 模块：

from libc.math cimport sin, cos
import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)
def calculateXYZ(list frames, double[:] cosVertCorrection, double[:] sinVertCorrection):
    cdef long numberFrames = len(frames)
    cdef long i, j, k, numberBlocks
    cdef list finalResults = []
    cdef list intensities = []
    cdef list frameXYZ = []
    cdef double azimuth, xy, x, y, z, sinRotational, cosRotational
    cdef double[32] rawDistance
    cdef int[32] intensity
    cdef double[:] tempX
    cdef double[:] tempY
    cdef double[:] tempZ
    cdef int positionsFilled = 0

    for i in xrange(numberFrames):
        numberBlocks = len(frames[i])
        tempX = np.zeros(numberBlocks * 32, dtype=np.double)
        tempY = np.zeros(numberBlocks * 32, dtype=np.double)
        tempZ = np.zeros(numberBlocks * 32, dtype=np.double)
        frameXYZ = [[] for i in range(3)]
        positionsFilled = 0

        for j in xrange(numberBlocks):
            # This is where I tested for the data in Cython
            # This is the information that is different. 
            # It is reading from what was passed to it from python.

            azimuth = frames[i][j][0]
            rawDistance = frames[i][j][1]
            intensity = frames[i][j][2]
            sinRotational, cosRotational = sin(azimuth), cos(azimuth)

            for k in xrange(32):
                xy = rawDistance[k] * cosVertCorrection[k]
                x, y = xy * sinRotational, xy * cosRotational
                z = rawDistance[k] * sinVertCorrection[k]

                if x != 0 or y != 0 or z != 0:
                    tempX[positionsFilled] = x
                    tempY[positionsFilled] = y
                    tempZ[positionsFilled] = z
                    intensities.append(intensity[k])
                    positionsFilled = positionsFilled + 1

        frameXYZ[0].append(np.asarray(tempX[0:positionsFilled].copy()).tolist())
        frameXYZ[1].append(np.asarray(tempY[0:positionsFilled].copy()).tolist())
        frameXYZ[2].append(np.asarray(tempZ[0:positionsFilled].copy()).tolist())
        finalResults.append(frameXYZ)

    return finalResults, intensities

这是它的纯 Python 版本：

documentXYZ = []
intensities = []

# I tested to see what the original data was in here adding prints

for frame in frames:
    frameXYZ = [[] for i in range(3)]
    frameX, frameY, frameZ = [], [], []
    for block in frame:
        sinRotational, cosRotational = np.math.sin(block[0]), np.math.cos(block[0])
        rawDistance, intensity = np.array(block[1]), np.array(block[2])
        xy = np.multiply(rawDistance, cosVertCorrection)
        x, y, z = np.multiply(xy, sinRotational), np.multiply(xy, cosRotational), np.multiply(rawDistance, sinVertCorrection)
        maskXYZ = np.logical_and(np.logical_and(x, x != 0), np.logical_and(y, y != 0), np.logical_and(z, z != 0))
        frameX += x[maskXYZ].tolist()
        frameY += y[maskXYZ].tolist()
        frameZ += z[maskXYZ].tolist()
        intensities += intensity[maskXYZ].tolist()

    frameXYZ[0].append(frameX), frameXYZ[1].append(frameY), frameXYZ[2].append(frameZ)
    documentXYZ.append(frameXYZ)

我知道浮点值的精度可能存在差异（尽管我认为不应该，因为我在所有结构中都使用doubles），但我不明白为什么intensity整数值也正在更改。我希望精度与 Python 相同。

关于如何改进这一点的任何想法？

谢谢。

【问题讨论】：

您能否也列出原始 Python/numpy 代码？我们可以尝试对 Cython 代码进行逆向工程，但最好与您拥有的原始代码进行比较。
“一旦它进入 Cython，同样的数据看起来像：” 你在什么时候打印/验证这些数据？这里的“进入 Cython”是什么意思？更好的是，你能做一个显示完整流程的小例子吗？只需删除类似计算的绒毛，减少数据量，看看您是否仍然得到相同的行为，然后将其作为一个独立的示例。

标签： python c numpy cython precision

【解决方案1】：

解决问题的前两个步骤是：

确定 NumPy 在您的平台上使用的特定整数类型（例如 int32、int64 ...），例如通过检查整数数组的 dtype 属性或其值之一。
使用您选择的 C 实现在您的平台上确定 int 的位宽。通常它将是 32 位，但并非总是如此（例如检查 sizeof）。

一旦你知道了这两个细节，你就可以确定一个普通的 (C) int 以何种方式无法匹配 NumPy 一直使用的整数精度。一个常见的猜测是 NumPy 使用的是int64，但在 C 语言中你使用的是int，这可能是你的平台/实现的int32。另一种常见情况是 NumPy 使用无符号整数，而在 C 中 int 将被签名，即使具有相同的位数也会导致不同的表示。

您可以在 Cython 中轻松引用固定宽度的整数，至少可以通过以下三种方式：

既然你用过cimport numpy as np，你可以参考NumPy的定宽整数类型，比如np.int64_t或者np.uint8_t。 NumPy 的 Cython 支持中提供了“_t”类型定义。
如果您更喜欢使用 C 的标准头文件来处理指定大小的固定宽度整数，也可以从 C 标准库中导入，例如 from libc.stdint import int64_t, uint8_t。

假设您选择了适当的整数类型，那么您可以使用正确的类型声明您的 intensity 数组，例如以下任何一种，具体取决于您选择哪种方法来表达正确的整数类型：

cdef np.uint8_t[32] intensity   # If using NumPy integer types
cdef uint8_t[32] intensity      # If importing from libc.stdint
cdef cython.uchar[32] intensity # If using Cython integer types

作为最后一点，最好记住常规 Python 整数是无限精度的，所以如果你设法得到一个 int 类型的 NumPy 数组（不是 C int，而是 Python int），在 Cython 中工作时，您必须决定使用不同的、固定精度的表示形式，或者使用包含 Python int 类型的数组或类型化内存视图（这通常会破坏首先使用 Cython 的目的）。

【讨论】：