【问题标题】：How to vectorize populating larger matrix with items of smaller matrix in python如何在python中用较小的矩阵项向量化填充较大的矩阵
【发布时间】：2017-06-17 13:41:14
【问题描述】：

我有一些小的对称矩阵，它们是较大对称矩阵的低维表示。我有一个向量，它是显示高 D 矩阵的哪些单元应链接到低 D 矩阵中的哪些单元的关键。

我想通过用低维矩阵中的对应值填充较大的矩阵来重新创建这些较大的矩阵。我相信应该有一个矢量化的方法来解决这个问题，但到目前为止，我所能想到的只是一个简单的嵌套 for 循环，这对于这些矩阵（10k+ 行和列）来说速度非常慢。

在这个玩具示例中，key 是 vec1，低维矩阵是 source_mat，高维矩阵是 target_mat。我需要创建 target_mat ，其中每个单元格都根据 key 填充 source_mat 中的相应值。

    import pandas as pd
    import numpy as np
    import random

    vec1=[]
    for x in range (0, 100):
        vec1.append(random.randint(0, 19)) #creating the key

    vec1=pd.DataFrame(vec1)
    sizevec1=vec1.shape[0]
    matshape=(sizevec1,sizevec1)
    target_mat=np.zeros(matshape) #key and target have same shape
    target_mat=pd.DataFrame(target_mat)

    temp=np.random.random((20,20))
    source_mat=temp*temp.T

    for row in range(0,target_mat.shape[0]):
        for column in range(0,target_mat.shape[1]):
            print 'row is ', row
            print 'column is', column
            target_mat.iloc[row,column] = source_mat.item(int(vec1.iloc[row]), int(vec1.iloc[column]))

【问题讨论】：

如果您在 Windows 上运行 Python 3.5，您发布的答案对您来说不够快，我可以将其编译为 Cython 程序并让编译器优化每个循环。然后你可以将它作为一个函数导入来调用它，它会更快。
感谢@Matt - 当我在大规模数据上尝试我的答案时，它仍然不够快。不幸的是，我在 Mac 和 Linux 服务器上运行 2.7，但 cython 方法听起来很有希望。想知道是否有办法使用地图函数来矢量化整个操作。
当我回到我的电脑时，我会看看我的虚拟盒子是否可以处理代码。在 Linux 上，它转到 gcc，虽然这更棘手，因为我的工作环境都是 Windows。尽管如此，当您使用 Cython 编译普通 Python 函数时，如果可能的话，编译器会将循环折叠为矢量化版本，这肯定会提高性能
听起来不错！请随时通知我。如果我也遇到一种矢量化方法，我会发布。
我首先要做的是在 Windows 上使用 Cython 进行编译，因为我的构建环境已经设置好，看看会带来什么样的速度提升。我们还将 BP 的所有内容迁移到 Linux Amazon 实例，因此如果看起来不错，我将尝试设置 Linux GCC 构建，但这可能需要一些时间。至少您会对 c++ 编译器及其继承的循环优化可以提高哪些速度有所了解。

标签： python matrix indexing vectorization dimensionality-reduction

【解决方案1】：

这比您的“快速”答案快 3 倍。

import random
import time
import numpy as np
vec1=[]
for x in range (0, 1000):
    vec1.append(random.randint(0, 19))

vec1=np.array(vec1)
sizevec1=vec1.shape[0]
matshape=(sizevec1,sizevec1)
target_mat=np.zeros(matshape)

temp=np.random.random((20,20))
source_mat=temp*temp.T
###FasterMethod###

target_mat=np.zeros(matshape)

def matrixops(vec1, source_mat, target_mat):
    matrixtime = time.time()
    for row in range(0,source_mat.shape[0]):
        for column in range(0, source_mat.shape[1]):

            rowmatch = np.array(vec1==row)
            rowmatch = rowmatch*1

            colmatch = np.array(vec1==column)
            colmatch = colmatch*1

            match_matrix=rowmatch*colmatch.T
            target_mat=target_mat+(match_matrix*source_mat[row,column])

    print((time.time() - matrixtime))

if __name__ == "__main__":
    matrixops(vec1, source_mat, target_mat)

您的快速版本时间：4.246443033218384 本版本时间：1.4500105381011963

正如我的评论所说，Cython 版本一点也不快。使其更快的唯一方法是采用依赖于 Python GIL 的行并将其转换为 C++ 样式操作（就像我对 == 部分所做的那样，编写一个与 NumPy 执行相同操作的 C++ 类循环功能，但 MemoryViews 不支持。在这里发布以供参考，因为我花了很多时间：

cimport numpy
from numpy import array, multiply, asarray, ndarray, zeros, dtype, int
cimport cython
from cython cimport view
from cython.parallel cimport prange #this is your OpenMP portion
from openmp cimport omp_get_max_threads #only used for getting the max # of threads on the machine

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
cpdef matrixops(int[::1] vec1, double[:,::1] source_mat, double[:,::1] target_mat):
    cdef int[::1] match_matrix =zeros(vec1.shape[0], dtype=int)
    cdef int[::1] rowmatch =zeros(vec1.shape[0], dtype=int)
    cdef int[::1] colmatch =zeros(vec1.shape[0], dtype=int)
    cdef int maxthreads = omp_get_max_threads()
    cdef int row, column, i
    # here's where you'd substitute 
    # for row in prange(source_mat.shape[0], nogil=True, num_threads=maxthreads, schedule='static'): # to use all cores
    for row in range(0,source_mat.shape[0]):
        for column in range(0, source_mat.shape[1]):
        #this is how to avoid the GIL
            for i in range(vec1.shape[0]):
                rowmatch[i]=(row==vec1[i])

            for i in range(vec1.shape[0]):
                colmatch[i]=(column==vec1[i])
            # this part has to be modified to not call Python GIL functions like was done above
            match_matrix=multiply(rowmatch,colmatch.T)
            target_mat=target_mat+(multiply(match_matrix,source_mat[row,column]))

上面就是你的 .PYX 文件。如果你运气好，你通常会在 4 核上看到 3 倍的加速。抱歉，我没有成功，但是使用直接 Python 库，比您的 100 倍快的解决方案快 3 倍仍然不错。

【讨论】：

@AkiNikolaidis 上面的答案是我能得到的最快的。我可以提供样板 Cython 代码，但速度完全相同，令人惊讶的是，这些库已经完全优化了！！对该代码唯一能做的就是将它放在 nogil=True prange 语句下并对其进行多线程处理，但要避免依赖 Python GIL 的部分，需要进行相当多的重新排列。
非常感谢@Matt！我会试一试 - 以前从未与 Cython 合作过，但可能应该更熟悉 Cython 来解决这些问题
@AkiNikolaidis 3 倍速度提升是否足够好（与您的答案相比）？要真正让 Cython 代码正常工作，您必须重写 NumPy 乘法和转置函数，以便基本上不使用 NumPy。
我在上面发布了另一个解决方案:)

【解决方案2】：

以下是对代码的两个单独更新，这些更新导致了显着的加速。

首先 - 找出矢量化解决方案，因此现在一步完成计算。即使在第二次更改之后，这也是最快的方法-

第二次将所有 pandas 数据帧更改为 numpy 数组。这种变化对 for 循环代码的影响最大——现在运行速度快了几个数量级。

下面的代码计算了所有 3 个方法，'slow'、'fast' 和 'Xu Mackenzie'，以提出矢量化解决方案的朋友命名；-P

#初始化变量

import time
import random
import pandas as pd
import numpy as np

n=13000
k=2000
i=0
vec1=[]
for x in range(0, n):
   vec1.append(random.randint(0, k-1))

temp=np.random.random((k,k))
#vec1=pd.DataFrame(vec1)
vec1=np.array(vec1)
#vec=pd.DataFrame(np.arange(0,300))
#vec2=pd.concat([vec,vec1], axis=1)
#sizevec1=vec1.shape[0]
sizevec1=len(vec1)
matshape=(sizevec1,sizevec1)
target_mat=np.zeros(matshape)
#target_mat=pd.DataFrame(target_mat)


source_mat=temp*temp.T
transform_mat=np.zeros((len(source_mat),len(target_mat)))

慢解

matrixtime = time.time()
for row in range(0,target_mat.shape[0]):
    #print 'row is ', row
    for column in range(0,target_mat.shape[1]):

        #print 'column is', column
        target_mat[row,column] = source_mat.item(int(vec1[row]), int(vec1[column]))
print((time.time() - matrixtime))
target_mat_slow=target_mat
target_mat=np.zeros(matshape)

徐麦肯齐解决方案

matrixtime = time.time()

for i in range(0,len(target_mat)):
  transform_mat[vec1[i],i]=1

temp=np.dot(source_mat,transform_mat)
target_mat=np.dot(temp.T,transform_mat)
target_mat_XM=target_mat
target_mat=np.zeros(matshape)
XM_time= time.time() - matrixtime
print((time.time() - matrixtime))

以前的“快速”解决方案

matrixtime = time.time()
for row in range(0,source_mat.shape[0]):
    print 'row is ', row
    #for column in range(0, source_mat.shape[1]):
    for column in range(0, row):   
        rowmatch = np.array([vec1==row])
        rowmatch = rowmatch*1

        colmatch = np.array([vec1==column])
        colmatch = colmatch*1

        match_matrix=rowmatch*colmatch.T
        target_mat=target_mat+(match_matrix*source_mat[row,column])

print((time.time() - matrixtime))
target_mat_fast=target_mat
target_mat=np.zeros(matshape)

等效性测试

target_mat_slow==target_mat_fast
target_mat_fast==target_mat_XM

【讨论】：

喜欢，放弃，走矢量化路线！

【解决方案3】：

我设法提出了一个解决方案，该解决方案提供了非常好的加速，尤其是对于较大的矩阵。这依赖于循环遍历较小的矩阵并用匹配的元素填充大矩阵。

我用 vec1 作为包含 1000 个元素的向量尝试了这个解决方案，发现比之前的方法有 100 倍的加速。

    import random
    import time
    import pandas as pd
    import numpy as np
    vec1=[]
    for x in range (0, 1000):
        vec1.append(random.randint(0, 19))

    vec1=pd.DataFrame(vec1)
    sizevec1=vec1.shape[0]
    matshape=(sizevec1,sizevec1)
    target_mat=np.zeros(matshape)
    target_mat=pd.DataFrame(target_mat)

    temp=np.random.random((20,20))
    source_mat=temp*temp.T

    ###Slow Method###

    matrixtime = time.time()
    for row in range(0,target_mat.shape[0]):
        for column in range(0,target_mat.shape[1]):
            #print 'row is ', row
            #print 'column is', column
            target_mat.iloc[row,column] = source_mat.item(int(vec1.iloc[row]), int(vec1.iloc[column]))
    print((time.time() - matrixtime))
    target_mat_slow=target_mat



    ###FasterMethod###

    target_mat=np.zeros(matshape)
    target_mat=pd.DataFrame(target_mat)

    matrixtime = time.time()
    for row in range(0,source_mat.shape[0]):
         for column in range(0, source_mat.shape[1]):

            rowmatch = np.array(vec1==row)
            rowmatch = rowmatch*1

            colmatch = np.array(vec1==column)
            colmatch = colmatch*1

            match_matrix=rowmatch*colmatch.T
            target_mat=target_mat+(match_matrix*source_mat[row,column])

    print((time.time() - matrixtime))
    target_mat_fast=target_mat

    #Test Equivalence
    target_mat_slow==target_mat_fast

【讨论】：