为什么 Numba pycc 编译会中止？答案

【问题标题】：Why does Numba pycc compilation abort?为什么 Numba pycc 编译会中止？
【发布时间】：2018-11-15 10:46:24
【问题描述】：

首先是一些上下文：我正在尝试使用 scipy.integrate.odeint 经常使用不同的初始条件 x_ 和参数 r_ 和 d_ 来集成耦合 ODE。我试图通过提前编译 ODE 的右侧并尝试减少调用 odeint 函数的次数来加快集成速度。

我正在尝试使用numba.pycc.CC 提前编译一个python 函数。这适用于简单的功能，例如：

import numpy
from numba.pycc import CC

cc = CC('x_test')
cc.verbose = True

@cc.export('x_test', 'f8(f8[:])')
def x_test(y):
    return numpy.sum(numpy.log(y) * .5) # just a random combination of numpy functions I used to test the compilation

cc.compile()

我要编译的实际函数如下：

# code_generation3.py
import numpy
from numba.pycc import CC

"""
N = 94
input for x_dot_all could look like:
    x_ = numpy.ones(N * 5)
    x[4::5] = 5e13
    t_ := some float from a numpy linspace. it is passed by odeint.
    r_ = numpy.random.random(N * 4)
    d_ = numpy.random.random(N * 4) * .8

    In practice the size of x_ is 470 and of r_ and d_ is 376.
"""

cc = CC('x_temp_dot1')
cc.verbose = True

@cc.export('x_temp_dot1', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
    """
    rhs of the lotka volterra equation for all "patients"
    :param x: initial conditions, always in groupings of 5: the first 4 is the bacteria count, the 5th entry is the carrying capacity
    :param t: placeholder required by odeint
    :param r: growth rates of the types of bacteria
    :param d: death rates of the types of bacteria

    returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity in the same ordering as the initial conditions 
    """
        def x_dot(x, t, r, d, j):
        """
        rhs of the differential equation describing the intrahost evolution of the bacteria
        :param x: initial conditions i.e. bacteria sizes and environmental carrying capacity
        :param t: placeholder required by odeint
        :param r: growth rates of the types of bacteria
        :param d: death rates of the bacteria
        :param j: placeholder for the return value

        returns the right hand side of the competitive lotka-volterra equation with finite and shared carrying capacity
        """

        j[:-1] = x[:-1] * r * (1 - numpy.sum(x[:-1]) / x[-1]) - d * x[:-1]
        j[-1]   = -numpy.sum(x[:-1])
        return j 

    N = r_.shape[0]
    j = numpy.zeros(5)
    g = [x_dot(x_[5 * i : 5 * (i+1)], t_, r_[4 * i : 4* (i+1)], d_[4 * i: 4 * (i+1)], j) for i in numpy.arange(int(N / 4) )]

    for index, value in enumerate(g):
        h[5 * index : 5 * (index + 1)] = value

    return h

cc.compile()

在这里我收到以下错误消息：

[xxxxxx@xxxxxx ~]$ python code_generation3.py 
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
generating LLVM code for 'x_temp_dot1' into /tmp/pycc-build-x_temp_dot1-wyamkfsy/x_temp_dot1.cpython-36m-x86_64-linux-gnu.o
python: /root/miniconda3/conda-bld/llvmdev_1531160641630/work/include/llvm/IR/GlobalValue.h:233: void llvm::GlobalValue::setVisibility(llvm::GlobalValue::VisibilityTypes): Assertion `(!hasLocalLinkage() || V == DefaultVisibility) && "local linkage requires default visibility"' failed.
Aborted

我想知道我做错了什么？

这两个函数都可以使用 @jit(nopython = True) 装饰器。令我感到羞耻的是，我还尝试对列表理解进行硬编码（以尝试避免任何 for 循环和进一步的函数调用），但这有同样的问题。

我知道我分别处理/创建返回值 h 和 j 的方式既不高效也不优雅，但我无法为 odeint 获取正确形状的返回值，因为 numba 不能很好地处理 numpy.reshape。

我搜索了numba documentation 寻求帮助，但这并没有帮助我理解我的问题。我已经搜索了错误消息，但只找到了这个link，这可能是相似的。但是将 numba 降级到 0.38.0 对我不起作用。

谢谢大家！

【问题讨论】：

标签： python python-3.x llvm numba pyc

【解决方案1】：

我想如果你先编译x_dot 然后再编译x_dot_all 会起作用。无论如何，我建议将这两个功能结合起来。

在Numba 中，循环通常不是问题，但列表推导肯定是。还要尽量避免大量的小循环。（矢量化命令，例如numpy.sum(x[:-1]) 都是单独的循环）。有时 Numba 能够结合这些循环来获得高效的代码，但并非每次都如此。

示例

# code_generation3.py
import numpy
import numba as nb
from numba.pycc import CC

cc = CC('x_dot_all')
cc.verbose = True


@cc.export('x_dot_all_mod', 'f8[:](f8[:], f8, f8[:], f8[:], f8[:])')
def x_dot_all(x_,t_,r_,d_, h):
  N = r_.shape[0]

  for i in range(int(N / 4)):
    sum_x=x_[5*i+0]+x_[5*i+1]+x_[5*i+2]+x_[5*i+3]
    TMP=1.-(sum_x)/x_[5*i+4]

    h[i*5+0]=x_[i*5+0]*r_[4*i+0]*TMP-d_[4*i+0]*x_[i*5+0]
    h[i*5+1]=x_[i*5+1]*r_[4*i+1]*TMP-d_[4*i+1]*x_[i*5+1]
    h[i*5+2]=x_[i*5+2]*r_[4*i+2]*TMP-d_[4*i+2]*x_[i*5+2]
    h[i*5+3]=x_[i*5+3]*r_[4*i+3]*TMP-d_[4*i+3]*x_[i*5+3]
    h[i*5+4]=-sum_x

  return h

if __name__ == "__main__":
    cc.compile()

性能

N=94
x_ = np.ones(N * 5)
x_[4::5] = 5e13
t_ = 15
r_ = np.random.random(N * 4)
d_ = np.random.random(N * 4) * .8
h = np.zeros(N * 5)

#your version: 38 µs
#new version:  1.8µs

【讨论】：

另外，非常感谢您的 cmets，我会牢记他们的未来。
@user3925555 请问为什么要提前编译？您是否正在重新分发预编译的函数。这里有几个陷阱（比如编译通用 x64 而不是 March=native）。由于 SIMD 向量化在这里不起作用，所以没关系，但这会对其他示例产生很大影响......另外值得一提的是：Numba 基本上只是另一个用于生成 llvm-ir 代码的翻译器（如 Clang 或 Flang ）。如果你有 C 方面的经验，写代码的时候像 C 一样思考（如果代码最后看起来不难看）
我正在尝试使用 ODE 模拟一天的一些细菌生长，然后应用 Luria Delbruck 分布计算一天结束时的突变体数量。我没有重新分发代码，我目前正在尝试运行模拟，一旦它运行得足够快且足够好，我会尝试将一些参数拟合到现实生活中的实验中。