使用取决于先前值的操作矢量化 numpy 代码答案

【问题标题】：Vectorize numpy code with operation depending on previous value使用取决于先前值的操作矢量化 numpy 代码
【发布时间】：2019-03-13 08:11:06
【问题描述】：

以下代码模拟了一个可以随时采样 3 个不同状态的系统，这些状态之间的恒定转移概率由矩阵 prob_nor 给出。因此，trace 中的每个点都依赖于之前的状态。

n_states, n_frames = 3, 1000
state_val = np.linspace(0, 1, n_states)

prob = np.random.randint(1, 10, size=(n_states,)*2)
prob[np.diag_indices(n_states)] += 50

prob_nor = prob/prob.sum(1)[:,None] # transition probability matrix, 
                                    # row sum normalized to 1.0

state_idx = range(n_states) # states is a list of integers 0, 1, 2...
current_state = np.random.choice(state_idx)

trace = []      
sigma = 0.1     
for _ in range(n_frames):
    trace.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

上面代码中的循环使它运行得很慢，特别是当我必须对数百万个数据点进行建模时。有没有办法对其进行矢量化/加速？

【问题讨论】：

'vectorize' 在最严格的numpy 意义上意味着在编译代码中对整个数组进行操作。它将迭代移动到编译级别，不受 Python 代码的控制。所以一个固有的顺序、迭代的问题不能被“向量化”。一次为一个值重复调用这些 np.random 函数比为多个值调用一次要慢得多。
最近有人问为什么 Python random.random 函数比 np.random 函数快。一次用于一个值时，它们会更快。
@hpaulj 我想你指的是stackoverflow.com/a/50790263/8033585
"...我必须对数百万个数据点进行建模" 对于您感兴趣的问题，n_states 和 n_frames 的典型值是多少？
@WarrenWeckesser n_states 大约是 2-10，但偶尔转移概率矩阵 (prob_nor) 是稀疏的，在这种情况下 n_states 是 10-100。 n_frames1e3-1e6。 trace 必须生成 1000 次

标签： python numpy vectorization

【解决方案1】：

尽快卸载概率计算：

possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)

然后您可以简单地进行查找以跟随您的路径：

path_trace = [None]*n_frames
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]

一旦你有了你的路径，你就可以计算你的踪迹：

sigma = 0.1
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

比较时间：

纯pythonfor循环

%%timeit
trace_list = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    trace_list.append(np.random.normal(loc=state_val[current_state], scale=sigma))
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

结果：

30.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

矢量化查找：

%%timeit
current_state = np.random.choice(state_idx)
path_trace = [None]*n_frames
possible_paths = np.vstack(
    np.random.choice(state_idx, p=prob_nor[curr_state, :], size=n_frames)
    for curr_state in range(n_states)
)
for step in range(n_frames):
    path_trace[step] = possible_paths[current_state, step]
    current_state = possible_paths[current_state, step]
trace = np.random.normal(loc=state_val[path_trace], scale=sigma, size=n_frames)

结果：

641 µs ± 6.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

加速大约 50 倍。

【讨论】：

这不起作用。 current_state 给出了选择的概率，并且每次都会改变。我怀疑这是一个马尔可夫链，这个解决方案不会给出正确的转换概率。
这里的转换矩阵是随机的，但我怀疑实际代码中并非如此。
@MatthieuBrucher 是的，它是马尔可夫链。不，此答案中的代码不起作用
您必须在每次计算新结果时创建possible_paths，因此生成possible_paths 的时间应该包含在您的方法的总时间中。（它可能仍然比原来的要快得多——我还没有尝试过。）
此方法有效。对于每个状态j 和每个“时间”k，possible_paths[j, k] 保存一个随机生成的下一个状态。该值是使用来自prob_nor 的适当行预先计算的。它预先计算了超过生成路径所必需的量，但它使用 numpy 的矢量化代码来完成，因此它比重复调用原始代码要快得多。在评论中，Brenlla 给出了问题参数的预期范围，这应该足以决定这个答案是否是一个可行的解决方案。

【解决方案2】：

也许我遗漏了一些东西，但我认为您可以将 current_states 创建为列表，然后将剩余步骤矢量化：

# Make list of states (slow part)
states = []
current_state = np.random.choice(state_idx)
for _ in range(n_frames):
    states.append(current_state)
    current_state = np.random.choice(state_idx, p=prob_nor[current_state, :])

# Vectorised part
state_vals = state_val[states]   # alternatively np.array(states) / (n_states - 1)
trace = np.random.normal(loc=states, scale=sigma)

我相信这种方法有效，并且会在使用一些额外内存的同时带来适度的速度提升（创建了 3 个列表/数组而不是一个）。 @PMende 的解决方案带来了更大的速度提升。

【讨论】：