这是一个矢量化方法,灵感来自this post 中提到的技巧 -
def fillval(a, fill):
info = np.asarray(fill)
start, stop, val = info.T
id_arr = np.zeros(len(a), dtype=int)
id_arr[start] = 1
id_arr[stop] = -1
a[id_arr.cumsum().astype(bool)] = np.repeat(val, stop - start)
return a
示例运行 -
In [676]: a = np.zeros(20, dtype=int)
...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]
In [677]: fillval(a, fill)
Out[677]:
array([ 0, 500, 500, 0, 0, 1000, 1000, 0, 0, 200, 200,
200, 200, 200, 200, 0, 0, 0, 0, 0])
修改/优化版本
这可以进一步修改/优化,以最小的内存占用在输入上做所有事情,就像这样 -
def fillval(a, fill):
fill = np.asarray(fill)
start, stop, val = fill[:,0], fill[:,1], fill[:,2]
a[start] = val
a[stop] = -val
return a.cumsum()
示例运行 -
In [830]: a = np.zeros(20, dtype=int)
...: fill = [[1, 3, 500], [5, 7, 1000], [9, 15, 200]]
In [831]: fillval(a, fill)
Out[831]:
array([ 0, 500, 500, 0, 0, 1000, 1000, 0, 0, 200, 200,
200, 200, 200, 200, 0, 0, 0, 0, 0])
基准测试
其他方法-
# Loopy one
def loopy(a, fill):
for start,stop,val in fill:
a[start:stop] = val
return a
# @Paul Panzer's soln
def multifill(target, spec):
spec = np.asarray(spec)
inds = np.zeros((2*len(spec) + 2,), dtype=int)
inds[-1] = len(target)
inds[1:-1] = spec[:, :2].astype(int).ravel()
lens = np.diff(inds)
mask = np.repeat((np.arange(len(lens), dtype=np.uint8)&1).view(bool), lens)
target[mask] = np.repeat(spec[:, 2], lens[1::2])
return target
时间安排 -
案例 #1:紧密间隔的短组
In [912]: # Setup inputs with group lengths at maximum extent of 10
...: L = 10000 # decides number of groups
...: np.random.seed(0)
...: s0 = np.random.randint(0,9,(L)) + 20*np.arange(L)
...: s1 = s0 + np.random.randint(2,10,(len(s0)))
...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
...: len_a = fill[-1][1]+1
...: a0 = np.zeros(len_a, dtype=int)
...: a1 = a0.copy()
...: a2 = a0.copy()
In [913]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.26 ms per loop
100 loops, best of 3: 4.49 ms per loop
100 loops, best of 3: 3.34 ms per loop
In [914]: # Setup inputs with group lengths at maximum extent of 10
...: L = 100000 # decides number of groups
In [915]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
10 loops, best of 3: 43.2 ms per loop
10 loops, best of 3: 49.4 ms per loop
10 loops, best of 3: 38.2 ms per loop
案例#2:宽间隔的长组
In [916]: # Setup inputs with group lengths at maximum extent of 10
...: L = 10000 # decides number of groups
...: np.random.seed(0)
...: s0 = np.random.randint(0,9,(L)) + 100*np.arange(L)
...: s1 = s0 + np.random.randint(10,50,(len(s0)))
...: fill = np.c_[s0,s1, np.random.randint(0,9,(len(s0)))].tolist()
...: len_a = fill[-1][1]+1
...: a0 = np.zeros(len_a, dtype=int)
...: a1 = a0.copy()
...: a2 = a0.copy()
In [917]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
100 loops, best of 3: 4.51 ms per loop
100 loops, best of 3: 9.18 ms per loop
100 loops, best of 3: 5.16 ms per loop
In [921]: # Setup inputs with group lengths at maximum extent of 10
...: L = 100000 # decides number of groups
In [922]: %timeit loopy(a0, fill)
...: %timeit multifill(a1, fill)
...: %timeit fillval(a2, fill)
10 loops, best of 3: 44.9 ms per loop
10 loops, best of 3: 89 ms per loop
10 loops, best of 3: 58.3 ms per loop
因此,选择最快的取决于用例,特别是典型的组长度及其在输入数组中的分布。