【问题标题】:Numpy-vectorized function to repeat blocks of consecutive elements用于重复连续元素块的 Numpy 向量化函数
【发布时间】:2018-12-11 19:00:52
【问题描述】:

Numpy 有 repeat 函数,它将数组中的每个元素重复给定的(每个元素)次数。

我想实现一个功能,它做类似的事情,但不是重复单个元素,而是可变大小的连续元素块。本质上我想要以下功能:

import numpy as np

def repeat_blocks(a, sizes, repeats):
    b = []    
    start = 0
    for i, size in enumerate(sizes):
        end = start + size
        b.extend([a[start:end]] * repeats[i])
        start = end
    return np.concatenate(b)

例如,给定

a = np.arange(20)
sizes = np.array([3, 5, 2, 6, 4])
repeats = np.array([2, 3, 2, 1, 3])

然后

repeat_blocks(a, sizes, repeats)

返回

array([ 0,  1,  2, 
        0,  1,  2,

        3,  4,  5,  6,  7, 
        3,  4,  5,  6,  7, 
        3,  4,  5,  6,  7, 

        8,  9, 
        8,  9,

        10, 11, 12, 13, 14, 15,

        16, 17, 18, 19,
        16, 17, 18, 19,
        16, 17, 18, 19 ])

我想以性能的名义将这些循环推入 numpy。这可能吗?如果有,怎么做?

【问题讨论】:

    标签: python algorithm numpy vectorization


    【解决方案1】:

    这是一种使用 cumsum 的矢量化方法 -

    # Get repeats for each group using group lengths/sizes
    r1 = np.repeat(np.arange(len(sizes)), repeats)
    
    # Get total size of output array, as needed to initialize output indexing array
    N = (sizes*repeats).sum() # or np.dot(sizes, repeats)
    
    # Initialize indexing array with ones as we need to setup incremental indexing
    # within each group when cumulatively summed at the final stage. 
    # Two steps here:
    # 1. Within each group, we have multiple sequences, so setup the offsetting
    # at each sequence lengths by the seq. lengths preceeeding those.
    id_ar = np.ones(N, dtype=int)
    id_ar[0] = 0
    insert_index = sizes[r1[:-1]].cumsum()
    insert_val = (1-sizes)[r1[:-1]]
    
    # 2. For each group, make sure the indexing starts from the next group's
    # first element. So, simply assign 1s there.
    insert_val[r1[1:] != r1[:-1]] = 1
    
    # Assign index-offseting values
    id_ar[insert_index] = insert_val
    
    # Finally index into input array for the group repeated o/p
    out = a[id_ar.cumsum()]
    

    【讨论】:

    • 如果数组可能很大,(sizes*repeats).sum() 最好使用np.dot(sizes, repeats)
    • 您能解释一下这背后的想法吗?
    • @yurikilochek 添加了一些 cmets。
    【解决方案2】:

    此功能是加速使用 Numba 的绝佳选择:

    @numba.njit
    def repeat_blocks_jit(a, sizes, repeats):
        out = np.empty((sizes * repeats).sum(), a.dtype)
        start = 0
        oi = 0
        for i, size in enumerate(sizes):
            end = start + size
            for rep in range(repeats[i]):
                oe = oi + size
                out[oi:oe] = a[start:end]
                oi = oe
            start = end
        return out
    

    这比 Divakar 的纯 NumPy 解决方案要快得多,并且更接近您的原始代码。我根本没有努力优化它。请注意np.dot()np.repeat() 不能在这里使用,但是当所有代码都被编译时,这并不重要。

    另外,由于它是 njit,意思是“nopython”模式,如果你有很多这样的调用,你甚至可以使用 @numba.njit(nogil=True) 并获得多核加速。

    【讨论】:

    • 你能提供一个加速测试吗?
    • @anishtain4:取决于数据大小。对于问题中的微小输入,差异超过 10 倍,但对于较大的输入,加速比更像是 1.5 倍。
    猜你喜欢
    • 1970-01-01
    • 2022-01-04
    • 1970-01-01
    • 2019-01-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多