给定切片的开始和结束索引列表来填充一维 numpy 数组的矢量化解决方案？答案

【问题标题】：Vectorized solution to filling a 1-D numpy array given a list of start and end indices for slicing?给定切片的开始和结束索引列表来填充一维 numpy 数组的矢量化解决方案？
【发布时间】：2017-12-10 14:55:23
【问题描述】：

给定一个称为 a 的一维零数组：

In [38]: a

Out[38]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

我想用某些值填充某些索引。我有一个开始和结束索引的列表，以及应该在这些位置填写的相关值。这存储在一个列表中：fill_oneDim_array

[[1, 3, 500], [5, 7, 1000], [9, 15, 200]]

例如：[1, 3, 500]，就这样填充数组a； a[1:3] = 500。对 [5, 7, 100] 重复 a[5:7] = 1000。

对此有矢量化解决方案吗？我想尽可能地避免 for 循环。

到目前为止我的研究： - 据我所知，似乎没有明显的解决方案。

【问题讨论】：

你现在怎么样了？如果您的非矢量化解决方案并不完全幼稚，我想看看。
对于这个测试用例，明显的迭代速度很快（est）：for s,e,v in spec: target[s:e]=v。它随填充集的数量而变化。
“矢量化”解决方案（如果有）的相对优势随切片数量与其长度相比而变化。答案中提出的矢量化只有在有很多短切片时才值得。
如果范围重叠，预期的行为是什么？还是他们永远不会这样做？

标签： python arrays numpy

【解决方案1】：

这是一个矢量化方法，灵感来自this post 中提到的技巧 -

def fillval(a, fill):
    info = np.asarray(fill)
    start, stop, val = info.T
    id_arr = np.zeros(len(a), dtype=int)
    id_arr[start] = 1
    id_arr[stop] = -1
    a[id_arr.cumsum().astype(bool)] = np.repeat(val, stop - start)
    return a

示例运行 -

In [676]: a = np.zeros(20, dtype=int)
     ...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]

In [677]: fillval(a, fill)
Out[677]: 
array([   0,  500,  500,    0,    0, 1000, 1000,    0,    0,  200,  200,
        200,  200,  200,  200,    0,    0,    0,    0,    0])

修改/优化版本

这可以进一步修改/优化，以最小的内存占用在输入上做所有事情，就像这样 -

def fillval(a, fill):
    fill = np.asarray(fill)
    start, stop, val = fill[:,0], fill[:,1], fill[:,2]
    a[start] = val
    a[stop] = -val
    return a.cumsum()

示例运行 -

In [830]: a = np.zeros(20, dtype=int)
     ...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]

In [831]: fillval(a, fill)
Out[831]: 
array([   0,  500,  500,    0,    0, 1000, 1000,    0,    0,  200,  200,
        200,  200,  200,  200,    0,    0,    0,    0,    0])

基准测试

其他方法-

# Loopy one
def loopy(a, fill):
    for start,stop,val in fill:
        a[start:stop] = val
    return a

# @Paul Panzer's soln
def multifill(target, spec):
    spec = np.asarray(spec)    
    inds = np.zeros((2*len(spec) + 2,), dtype=int)
    inds[-1] = len(target)
    inds[1:-1] = spec[:, :2].astype(int).ravel()
    lens = np.diff(inds)
    mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
    target[mask] = np.repeat(spec[:, 2], lens[1::2])
    return target

时间安排 -

案例 #1：紧密间隔的短组

In [912]: # Setup inputs with group lengths at maximum extent of 10
     ...: L = 10000 # decides number of groups
     ...: np.random.seed(0)
     ...: s0 = np.random.randint(0,9,(L)) + 20*np.arange(L)
     ...: s1 = s0 + np.random.randint(2,10,(len(s0)))
     ...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
     ...: len_a = fill[-1][1]+1
     ...: a0 = np.zeros(len_a, dtype=int)
     ...: a1 = a0.copy()
     ...: a2 = a0.copy()

In [913]: %timeit loopy(a0, fill)
     ...: %timeit multifill(a1, fill)
     ...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.26 ms per loop
100 loops, best of 3: 4.49 ms per loop
100 loops, best of 3: 3.34 ms per loop

In [914]: # Setup inputs with group lengths at maximum extent of 10
     ...: L = 100000 # decides number of groups

In [915]: %timeit loopy(a0, fill)
     ...: %timeit multifill(a1, fill)
     ...: %timeit fillval(a2, fill)
10 loops, best of 3: 43.2 ms per loop
10 loops, best of 3: 49.4 ms per loop
10 loops, best of 3: 38.2 ms per loop

案例#2：宽间隔的长组

In [916]: # Setup inputs with group lengths at maximum extent of 10
     ...: L = 10000 # decides number of groups
     ...: np.random.seed(0)
     ...: s0 = np.random.randint(0,9,(L)) + 100*np.arange(L)
     ...: s1 = s0 + np.random.randint(10,50,(len(s0)))
     ...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
     ...: len_a = fill[-1][1]+1
     ...: a0 = np.zeros(len_a, dtype=int)
     ...: a1 = a0.copy()
     ...: a2 = a0.copy()

In [917]: %timeit loopy(a0, fill)
     ...: %timeit multifill(a1, fill)
     ...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.51 ms per loop
100 loops, best of 3: 9.18 ms per loop
100 loops, best of 3: 5.16 ms per loop

In [921]: # Setup inputs with group lengths at maximum extent of 10
     ...: L = 100000 # decides number of groups

In [922]: %timeit loopy(a0, fill)
     ...: %timeit multifill(a1, fill)
     ...: %timeit fillval(a2, fill)
10 loops, best of 3: 44.9 ms per loop
10 loops, best of 3: 89 ms per loop
10 loops, best of 3: 58.3 ms per loop

因此，选择最快的取决于用例，特别是典型的组长度及其在输入数组中的分布。

【讨论】：

@PaulPanzer 已添加。
刚刚完成我的...您的解决方案实际上在我的基准测试中看起来比在您的更好。
@PaulPanzer 是的，我假设组长度的差异较小（只有 2-10）并且输入数组也更紧密。
@PaulPanzer 似乎ar.cumsum() 更快。此外，使用更广泛的群体镜头编辑了基准。
比np.cumsum? @hpaulj 不是总是告诉我们函数版本实际上调用了方法版本（如果存在）吗？但这不应该有很大的不同......

【解决方案2】：

您可以使用np.repeat 构建掩码和填充值：

import numpy as np

def multifill(target, spec):
    inds = np.zeros((2*len(spec) + 2,), dtype=int)
    inds[-1] = len(target)
    inds[1:-1] = spec[:, :2].astype(int).ravel()
    lens = np.diff(inds)
    mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
    target[mask] = np.repeat(spec[:, 2], lens[1::2])

target = np.zeros((16,))
spec = np.array([[1, 3, 500], [5, 7, 1000], [9, 15, 200]])
multifill(target, spec)
print(target)

# [    0.   500.   500.     0.     0.  1000.  1000.     0.     0.   200.
#    200.   200.   200.   200.   200.     0.]

基准。 Divakar2 最快，但它要求模板全为零。 PP 和 Divakar1 更灵活。 更新：这些都被简单的循环“谢谢”@hpaulj 蒸发掉了。

# hpaulj                0.00256890 ms
# pp                    0.01587310 ms
# D1                    0.01193481 ms
# D2                    0.00533720 ms
# n=100000
# hpaulj                0.03514440 ms
# pp                    0.57968440 ms
# D1                    0.87605349 ms
# D2                    0.34365610 ms
# n=1000000
# hpaulj                0.50301510 ms
# pp                    6.91325230 ms
# D1                    8.96669030 ms
# D2                    3.97435970 ms

代码：

import numpy as np
import types
from timeit import timeit

def f_hpaulj(target, spec):
    for s, e, v in spec:
        target[int(s):int(e)] = v

def f_pp(target, spec):
    inds = np.zeros((2*len(spec) + 2,), dtype=int)
    inds[-1] = len(target)
    inds[1:-1:2] = spec[:, 0].astype(int)
    inds[2:-1:2] = spec[:, 1].astype(int)
    lens = np.diff(inds)
    mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
    target[mask] = np.repeat(spec[:, 2], lens[1::2])

def f_D1(a, info):
    start, stop, val = info[:,0].astype(int), info[:,1].astype(int), info[:,2]
    id_arr = np.zeros(len(a), dtype=int)
    id_arr[start] = 1
    id_arr[stop] = -1
    a[id_arr.cumsum().astype(bool)] = np.repeat(val, stop - start)

def f_D2(a, info):
    start, stop, val = info[:,0].astype(int), info[:,1].astype(int), info[:,2]
    id_arr = np.zeros(len(a), dtype=val.dtype)
    id_arr[start] = val
    id_arr[stop] = -val
    return id_arr.cumsum()

def setup_data(n, k):
    inds = np.sort(np.random.randint(0, n-2*k, (2*k,)) + np.arange(2*k))
    return np.c_[inds.reshape(-1, 2), np.random.randint(1, 10, (k,))].astype(float)

for n in (100, 100000, 1000000):
    k = 3**(n.bit_length()>>3)
    spec = setup_data(n, k)
    ref = np.zeros((n,))
    f_pp(ref, spec)
    print(f"n={n}")
    for name, func in list(globals().items()):
        if not name.startswith('f_') or not isinstance(func, types.FunctionType):
            continue
        try:
            res = np.zeros((n,))
            ret = func(res, spec)
            if not ret is None:
                res = ret
            assert np.allclose(ref, res)
            print("{:16s}{:16.8f} ms".format(name[2:], timeit(
                'f(a, spec)', 'a=np.zeros((n,))',
                globals={'f':func, 'spec':spec, 'np':np, 'n':n}, number=10)*100))
        except Exception:
            print("{:16s} apparently failed".format(name[2:]))

【讨论】：

您的k 相对于n 小。为此，简单的迭代要快得多。 for s,e,v in spec: target[int(s):int(e)]=v
@hpaulj 是的！你就在那儿。编辑了我的时间安排以反映这一点。
@hpaulj 基本上 - 就像用擦拭地板一样？