2D IDs 数组和 1D 权重的加权 numpy bincount答案

【问题标题】：weighted numpy bincount for 2D IDs array and 1D weights2D IDs 数组和 1D 权重的加权 numpy bincount
【发布时间】：2020-10-24 10:42:38
【问题描述】：

我正在使用 numpy_indexed 来应用向量化的 numpy bincount，如下所示：

import numpy as np
import numpy_indexed as npi
rowidx, colidx = np.indices(index_tri.shape)
(cols, rows), B = npi.count((index_tri.flatten(), rowidx.flatten()))

其中index_tri 是以下矩阵：

index_tri = np.array([[ 0,  0,  0,  7,  1,  3],
       [ 1,  2,  2,  9,  8,  9],
       [ 3,  1,  1,  4,  9,  1],
       [ 5,  6,  6, 10, 10, 10],
       [ 7,  8,  9,  4,  3,  3],
       [ 3,  8,  6,  3,  8,  6],
       [ 4,  3,  3,  7,  8,  9],
       [10, 10, 10,  5,  6,  6],
       [ 4,  9,  1,  3,  1,  1],
       [ 9,  8,  9,  1,  2,  2]])

然后我将分箱后的值映射到如下初始化矩阵m的对应位置：

m = np.zeros((10,11))
m 
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])

m[rows, cols] = B
m
array([[3., 1., 0., 1., 0., 0., 0., 1., 0., 0., 0.],
       [0., 1., 2., 0., 0., 0., 0., 0., 1., 2., 0.],
       [0., 3., 0., 1., 1., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1., 2., 0., 0., 0., 3.],
       [0., 0., 0., 2., 1., 0., 0., 1., 1., 1., 0.],
       [0., 0., 0., 2., 0., 0., 2., 0., 2., 0., 0.],
       [0., 0., 0., 2., 1., 0., 0., 1., 1., 1., 0.],
       [0., 0., 0., 0., 0., 1., 2., 0., 0., 0., 3.],
       [0., 3., 0., 1., 1., 0., 0., 0., 0., 1., 0.],
       [0., 1., 2., 0., 0., 0., 0., 0., 1., 2., 0.]])

但是，这认为index_tri 中每列的每个值的权重为 1。现在，如果我有一个权重数组，则在 index_tri 中为每列提供相应的权重值而不是 1：

weights = np.array([0.7, 0.8, 1.5, 0.6, 0.5, 1.9])

如何应用加权 bincount，以便我的输出矩阵 m 变为如下：

array([[3., 0.5, 0., 1.9, 0., 0., 0., 0.6, 0., 0., 0.],
       [0., 0.7, 2.3, 0., 0., 0., 0., 0., 0.5, 2.5, 0.],
       [0., 4.2, 0., 0.7, 0.6, 0., 0., 0., 0., 0.5, 0.],
       [0., 0., 0., 0., 0., 0.7, 2.3, 0., 0., 0., 3.],
       [0., 0., 0., 2.4, 0.6, 0., 0., 0.7, 0.8, 1.5, 0.],
       [0., 0., 0., 2.3, 0., 0., 2.4, 0., 1.3, 0., 0.],
       [0., 0., 0., 2.3, 0.7, 0., 0., 0.6, 0.5, 1.9, 0.],
       [0., 0., 0., 0., 0., 0.6, 2.4, 0., 0., 0., 3.],
       [0., 3.9, 0., 0.6, 0.7, 0., 0., 0., 0., 0.8, 0.],
       [0., 0.6, 2.4, 0., 0., 0., 0., 0., 0.8, 2.2, 0.]])

有什么想法吗？

通过使用 for 循环和 numpy bincount() 我可以如下解决它：

for i in range(m.shape[0]):
   m[i, :] = np.bincount(index_tri[i, :], weights=weights, minlength=m.shape[1])

我正在尝试分别从here 和here 调整矢量化提供的解决方案，但我无法弄清楚ix2D 变量对应于第一个链接中的什么。如果可能的话，谁能详细说明一下。

更新（解决方案）：

基于下面@Divakar 的解决方案，这是一个更新版本，它需要一个额外的输入参数，以防您的索引输入矩阵不覆盖输出初始化矩阵的全部范围：

    def bincount2D(id_ar_2D, weights_1D, sz=None):
        # Inputs : 2D id array, 1D weights array

        # Extent of bins per col
        if sz == None:
            n = id_ar_2D.max() + 1
            N = len(id_ar_2D)
        else:
            n = sz[1]
            N = sz[0]

        # add offsets to the original values to be used when we apply raveling later on
        id_ar_2D_offsetted = id_ar_2D + n * np.arange(N)[:, None]

        # Finally use bincount with those 2D bins as flattened and with
        # flattened b as weights. Reshaping is needed to add back into "a".
        ids = id_ar_2D_offsetted.ravel()
        W = np.tile(weights_1D, N)
        return np.bincount(ids, W, minlength=n * N).reshape(-1, n)

【问题讨论】：

标签： python binning weighted numpy-indexed

【解决方案1】：

灵感来自this post -

def bincount2D(id_ar_2D, weights_1D):
    # Inputs : 2D id array, 1D weights array
    
    # Extent of bins per col
    n = id_ar_2D.max()+1
    
    N = len(id_ar_2D)
    id_ar_2D_offsetted = id_ar_2D + n*np.arange(N)[:,None]
    
    # Finally use bincount with those 2D bins as flattened and with
    # flattened b as weights. Reshaping is needed to add back into "a".
    ids = id_ar_2D_offsetted.ravel()
    W = np.tile(weights_1D,N)
    return np.bincount(ids, W, minlength=n*N).reshape(-1,n)

out = bincount2D(index_tri, weights)

【讨论】：

非常感谢，我发现在id_ar_2D_offseted 中，您正在偏移矩阵id_ar_2D 的初始值以覆盖二维维度。但是，您已经回答了它;-)。我还用一个解决方案更新了我的答案，在该解决方案中我传递了一个额外的参数，以防输入索引矩阵（它的值）不遵循已经初始化的输出矩阵的确切大小。