应用公式有效地获取 Numpy 矩阵答案

【问题标题】：Apply formula to get Numpy matrix efficiently应用公式有效地获取 Numpy 矩阵
【发布时间】：2021-05-06 07:06:34
【问题描述】：

我正在尝试解析 2 个向量，并根据公式填充矩阵。我就是这样做的，效率很低。

import numpy as np

list1 = [1, 2, 3, 4]
list2 = [20, 30, 40, 50, 60, 70, 80, 90]

array1 = np.array(list1)
array2 = np.array(list2)

columns = len(list1)
rows = len(list2)

matrix = np.zeros((rows, columns))

for column in range(0, columns):
    for row in range(2*column, rows):
        matrix[row, column] = round(10 * (array2[row] - array1[column]), 0)

print(matrix)

输出应该是

[[190.   0.   0.   0.]
 [290.   0.   0.   0.]
 [390. 380.   0.   0.]
 [490. 480.   0.   0.]
 [590. 580. 570.   0.]
 [690. 680. 670.   0.]
 [790. 780. 770. 760.]
 [890. 880. 870. 860.]]

这是一个例子，真正的数组很大。如何使用 numpy 内置代码以最有效和优化的方式执行此操作？

谢谢

【问题讨论】：

标签： python arrays numpy matrix

【解决方案1】：

检查一下：

list1 = list(range(1,50))
list2 = list(range(20,1000,10))
array1 = np.array(list1)
array2 = np.array(list2)

columns = len(list1)
rows = len(list2)

# yours
def f():
    matrix = np.zeros((rows, columns))
    for column in range(0, columns):
        for row in range(2*column, rows):
            matrix[row, column] = round(10 * (array2[row] - array1[column]), 0)
    return matrix

def g():
    col, row = np.meshgrid(np.arange(columns), np.arange(rows))
    mask = row>=2*col
    matrix = np.where(mask, np.round(10*(array2[:,None] - array1), 0), 0)
    return matrix

col, row = np.meshgrid(np.arange(columns), np.arange(rows))
mask = row>=2*col
def h():
    matrix = np.where(mask, np.round(10*(array2[:,None] - array1), 0), 0)
    return matrix


import timeit

%timeit f()
# 3.18 ms ± 246 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit g()
# 64.7 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit h()
# 21.7 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

f、g 和 h 在您的小型阵列上具有可比性，但 g 和 h 在大型阵列上会变得更快。

h 是对g 的进一步优化，如果您需要在相同大小的列表上多次执行相同类型的计算，因为您只能计算一次掩码...

【讨论】：

感谢您的回答，这在时间上是一个巨大的进步。之前计算真实数据需要 20 秒，现在不到 1 秒。

【解决方案2】：

上面的答案要详细得多，但我还是想提出这个简洁的解决方案，更容易理解。它依赖于您的问题的数学公式，因为您只想在矩阵的每个坐标处减去两个一维数组，一个简单的方法是使用@987654321 将您的一维数组“转换”为编写良好的二维数组@，然后你只需要减去它。

最后，要仅选择验证row >= 2*column 的坐标，您可以创建一个布尔数组(y >= 2*x)，当乘以您之前的数组C 时，会将所有不检查此条件的坐标置为0。

a = np.array([1, 2, 3, 4])
b = np.array([20, 30, 40, 50, 60, 70, 80, 90])
n = len(a)
m = len(b)

A = np.tile(a, (m,1))
B = np.tile(b, (n,1)).T

'''
At this point, we have :
A = [[1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]
     [1 2 3 4]]
and
B = [[20 20 20 20]
     [30 30 30 30]
     [40 40 40 40]
     [50 50 50 50]
     [60 60 60 60]
     [70 70 70 70]
     [80 80 80 80]
     [90 90 90 90]]

so by definition matrix is exactly 10 * (B-A) ! 
It is really easy to see by writting down a bit of maths, 
since matrix[i,j] = 10 * (b[i] - a[j]).
'''

C = np.round(10*(B-A), 0)

x,y = np.meshgrid(np.arange(n), np.arange(m))

matrix = C * (y >= 2*x)

【讨论】：

为什么要使用不同的方式来制作二维数组tile 和meshgrid？你不能用同样的方法两次吗？ sparse 版本的 meshgrid 怎么样？还是broadcasting？
这是我在做这个的时候想到的最简单的解决方案（其实我是在完成数组C之后才看到行和列的条件），没想到任何其他产生更清洁结果的东西。但是，如果您有更好的解决方案，请随时分享！它也可以使我受益:)