Scipy.sparse CSC 矩阵性能答案

【问题标题】：Scipy.sparse CSC-matrix performanceScipy.sparse CSC 矩阵性能
【发布时间】：2023-04-09 00:28:01
【问题描述】：

我目前正在尝试使用隐式欧拉解决一些方程。由于我对 Fortran 感到厌烦，因此我认为用 Python 尝试它可能是一个好主意，看看我（从性能的角度来看）与现有的 Fortran 程序有多接近。对于我的问题，我想利用稀疏矩阵。我遇到了我的程序当前的瓶颈是初始化稀疏矩阵并从对角线中减去一些东西。

以下最小示例演示了这一点：

import numpy                as np
from scipy.sparse           import csc_matrix
from scipy.sparse.linalg    import spsolve
from timeit                 import default_timer

# Example data for Sparse Matrix in CSC format
data = np.array([ -6.07315337e+07,  -1.08191534e+06,  -5.85677031e+07, \
               5.96496184e+07,   1.99723260e+07,  -3.99136095e+07, \
              -3.10384281e+04,   1.99412852e+07,   3.10384281e+04, \
               4.14012789e+04,  -4.13845644e+04,  -4.14179805e+04, \
               4.13845708e+04,   1.67016486e+01,   6.40664368e+03, \
              -1.21556953e+02,   6.28508672e+03,  -6.40664368e+03, \
               1.21556953e+02,   1.87698938e-03,   1.87698938e-03, \
              -1.87698938e-03,   6.17782975e-05,   6.17782975e-05, \
              -6.17782975e-05,   3.23024684e+00,   3.23024684e+00, \
              -3.23024684e+00,   1.59838512e+00,   1.59838512e+00, \
              -1.59838512e+00,   1.96353333e-02,   1.96353333e-02, \
              -1.96353333e-02,   4.25269958e+01,   4.25269958e+01, \
              -4.25269958e+01,   4.84489810e-06,   4.84489810e-06, \
              -4.84489810e-06,   2.54951658e-07,   2.54951658e-07, \
              -2.54951658e-07,   6.42250438e-08,   6.42250438e-08, \
              -6.42250438e-08])

indices = np.array([ 0,  1,  2,  3,  0,  1,  2,  3,  4,  0,\
                     1,  2,  4,  5,  0,  1,  2,  3,  5,  0,\
                     3,  4,  0,  4,  5,  0,  5,  6,  0,  6,\
                     7,  0,  7,  8,  0,  8,  9,  0,  9, 10,\
                     0, 10, 11,  0, 11, 12], dtype=np.int32)

indptr = np.array([ 0,  4,  9, 14, 19, 22, 25, 28, 31, 34,\
                   37, 40, 43, 46], dtype=np.int32)

# Stop the time to initialize the Sparse matrix in CSC-format
start = default_timer()
for i in range(10000):
    J = csc_matrix((data, indices, indptr), shape=(13, 13))
stop = default_timer()
print 'Initialize:'.ljust(15),stop - start

# Set the diagonal of the matrix. The diagonal is in principle known.
start = default_timer()
for i in range(10000):
    J.setdiag(1./1e-10 + J.diagonal())
stop = default_timer()
print 'Set diagonal:'.ljust(15), stop - start

# Set an array to solve something
b = np.array([ -4.16737068e+05, 8.32180182e+05, 1.29378997e+03,\
               -4.15443441e+05,-1.29326784e+03,-2.60963259e-01,\
                0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\
                0.00000000e+00, 0.00000000e+00, 0.00000000e+00,\
                0.00000000e+00])

# Stop the time to solve the system
start = default_timer()
for i in range(10000):
    x = spsolve(J,b)
stop = default_timer()
print 'Solve:'.ljust(15), stop - start

我知道改变矩阵的稀疏性通常很昂贵。原则上我知道对角线的索引，但是一旦将数据存储在 scipy csc_matrix 中，我不知道如何更改数据。但是Matrix的初始化也几乎和解决系统一样昂贵？对我来说，示例程序的输出是：

初始化：0.516402959824

设置对角线：1.67107796669

求解：0.845117807388

有没有办法绕过 scipy 稀疏矩阵或加快速度？我想过直接打电话给 Pardiso，但这对我来说看起来相当复杂。

【问题讨论】：

我发现在矩阵乘法等其他情况下，稀疏矩阵必须具有优于 0.1 的稀疏度才能获得速度优势。您正在以最快的方式创建csc 矩阵。我对求解器了解不多，但它并不比创建步骤慢多少。我怀疑 set_diagonal 步骤可以加快。
如果你 store 在对角线上也有零，那么稀疏结构不需要改变。但我认为 13 x 13 阵列太小，无法获得良好的性能。在进入快速 C 例程之前必须运行大量 Python 代码。
这里给出的 13 x 13 矩阵只是一个例子。通常矩阵为~4000 x 4000，稀疏度为~90%。但是为什么创建 scipy csc 矩阵需要这么长时间呢？原则上我给出了 scipy 想要的格式。

标签： python numpy scipy sparse-matrix

【解决方案1】：

只初始化一次 scipy.sparse.csc_matrix 的实例时，计算时间可以大大减少。而不是写

J = csc_matrix((data, indices, indptr), shape=(13, 13))

在每次迭代中，最好写：

J.data = data
J.indices = indices
J.indptr = indptr

如果稀疏度不变且对角线的索引已知，则也可以直接操作对角线的数据，而不是使用 scipy 中的 setdiag 和对角线属性。

【讨论】：