【问题标题】:Strange behaviour of enumerate with cython and python3用cython和python3枚举的奇怪行为
【发布时间】:2021-04-30 08:13:26
【问题描述】:

我们有一堆代码要移植到 python3 中,我们面临着一个非常奇怪的枚举行为。

cdef char **c_argv
c_argv = <char**>malloc(sizeof(char*) * len(args))
for idx, s in enumerate(args):
    if bytes != str:
        s = s.encode('utf-8')
    c_argv[idx] = s

在 python2 中,我们将在 c_argv 中看到所有的 argv,而在 python3 中我们只看到一个 ... 注意,如果我们以“pythonic”的方式编写 for 而不使用枚举:

for i in args:

这也不起作用。

这是我们测试的完整复制器:

test_enumerate.pyx

from libc.stdlib cimport malloc, free
from libc.string cimport const_char

def test_enumerate(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    for idx, s in enumerate(args):
        if bytes != str:
            s = s.encode('utf-8')
        c_argv[idx] = s

    for i in range(len(args)):
        print("Set by enumerate",c_argv[i])        
    free(c_argv)

def test_loop_obj(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    idx=0
    for s in (args):
        if bytes != str:
            s = s.encode('utf-8')
        c_argv[idx] = s
        idx = idx+1
        
    for i in range(len(args)):
        print("Set by loop on objects",c_argv[i])        
    free(c_argv)

def test_loop(args):
    cdef char **c_argv
    c_argv = <char**>malloc(sizeof(char*) * len(args))
    for i in range(len(args)):
        if bytes != str:
            args[i] = args[i].encode('utf-8')
        c_argv[i] = args[i]

    for i in range(len(args)):
        print("Set by loop on index",c_argv[i])        
    free(c_argv)

test.py

from test_enumerate import test_enumerate, test_loop_obj, test_loop
test_enumerate(['salut','tu','vas','bien'])
test_loop_obj(['salut','tu','vas','bien'])
test_loop(['salut','tu','vas','bien'])

setup.py:

from setuptools import setup
from Cython.Build import cythonize
setup(
    ext_modules = cythonize("test_enumerate.pyx")
)

我们编译它:

python/python3 setup.py build_ext --inplace

这是说明我们问题的输出:

$ python test.py
('Set by enumerate', 'salut')
('Set by enumerate', 'tu')
('Set by enumerate', 'vas')
('Set by enumerate', 'bien')
('Set by loop on objects', 'salut')
('Set by loop on objects', 'tu')
('Set by loop on objects', 'vas')
('Set by loop on objects', 'bien')
('Set by loop on index', 'salut')
('Set by loop on index', 'tu')
('Set by loop on index', 'vas')
('Set by loop on index', 'bien')
$ python3 test.py
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by enumerate', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on objects', b'bien')
('Set by loop on index', b'salut')
('Set by loop on index', b'tu')
('Set by loop on index', b'vas')
('Set by loop on index', b'bien')

有人可以解释我们在这里缺少什么吗?

【问题讨论】:

    标签: python-3.x python-2.7 cython enumerate


    【解决方案1】:
    c_argv[idx] = s
    

    这会将c_argv[idx] 设置为指向s 数据的指针。指针仅在s 仍然存在时有效。

    s = s.encode('utf-8')
    

    如果出现此行,则会创建一个新的编码s,从而导致先前编码的s 被取消引用并因此可能被释放。

    基本上,除非您了解(并且可以控制)它们的生命周期,否则不要乱用 c 指针。

    【讨论】:

      猜你喜欢
      • 2012-11-15
      • 2017-07-28
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多