从 2 个 1D 数组生成 2D 数组的矢量化方法答案

【问题标题】：Vectorized way to generate 2D array from 2 1D arrays从 2 个 1D 数组生成 2D 数组的矢量化方法
【发布时间】：2019-11-25 18:11:25
【问题描述】：

我有一对等长的 numpy 数组。 dwells 包含代表停留时间的浮点数，ids 代表一个状态。在我的示例中，只有 3 个唯一状态标记为 0、1、2。

dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])

之前的 2 个数组模拟了一个系统，该系统从状态 2 开始，在那里停留 4.3 秒，跳转到状态 0，停留 0.2 秒等等。我想生成另一个 numpy 数组。它需要与dwells.sum() 一样多的列，每列代表一个整数 0,1,2,3... 表示时间。每行匹配一个唯一状态（在本例中为 3）。该数组的每个元素都表示该时间段内每个状态的相对贡献。例如，在前 4 个时间点，只有状态 2 有任何贡献，因此第 2 行的第 1 个 4 个元素等于1。第五列有来自所有 3 个州的贡献，但 sum 等于 1。

[[0, 0, 0, 0, 0.2, 0, 0,  0,  0]
 [0, 0, 0, 0, 0.5, 1, 1, 0.5, 0]
 [1, 1, 1, 1, 0.3, 0, 0, 0.5, 1]]

我可以使用for 循环来做到这一点，但我想知道是否有更有效的矢量化方式。

【问题讨论】：

0.1 是您可能的最小时间步长吗？
不，dwells 可以有任意长度，甚至可以任意小
啊，太糟糕了。但我会在这里留下我的答案。也许它可以帮助某人找到更通用的解决方案。
通常首先显示 for 循环解决方案会有所帮助。这定义了一个明确的目标。
如果您的代码是纯数字的，您可以尝试使用 numba 运行您的 for 循环。

标签： python arrays numpy vectorization

【解决方案1】：

假设我们的最小时间步长为delta：

import numpy as np

dwells = np.array([4.3,0.2,3,1.5])
ids = np.array([2, 0, 1, 2])

def dwell_map(dwells, ids, delta=0.1):
    import numpy as np
    import sys

    idelta = 1 / delta

    # ensure that idelta is an integer number
    if not idelta.is_integer():
        raise ValueError("1/delta is not integer") 

    idelta = int(idelta)

    # create new longer dwells array
    dwells_l = (dwells*idelta).astype(int)

    # create target array
    a = np.zeros((ids.max()+1, dwells_l.sum().astype(int)), dtype=int)

    # create repeats of the ids according to the dwell time
    ind = np.repeat(ids, dwells_l)

    # put ones at the position where we have the indices
    a[ind, np.arange(ind.size)] = 1

    # reduce back to the original time resolution
    a = a.reshape(ids.max()+1, -1, idelta).sum(axis=2)/idelta

    return a

res = dwell_map(dwells, ids, 0.1)

这只有在 delta 足够大并且总持续时间足够小的情况下才有效，这样中间数组就不会“无限”增长。

根据 iPython %timeit 魔法对您的示例数组的性能，将其与您的 for 循环解决方案进行比较：

10000 loops, best of 5: 58.5 µs per loop

【讨论】：