并行矩阵乘法答案

【问题标题】：Parallelized Matrix Multiplication并行矩阵乘法
【发布时间】：2021-03-08 10:52:15
【问题描述】：

我正在尝试并行化两个矩阵A,B 的乘法。

不幸的是，串行实现仍然比并行实现快，或者加速太低。（矩阵维度 = 512 的加速比就像 1.3）。可能有些事情根本上是错误的。有人可以给我小费吗？

double[][] matParallel2(final double[][] matrixA,
                        final double[][] matrixB,
                        final boolean parallel) {
    int rows = matrixA.length;
    int columnsA = matrixA[0].length;
    int columnsB = matrixB[0].length;

    Runnable task;
    List<Thread> pool = new ArrayList<>();

    double[][] returnMatrix = new double[rows][columnsB];
    for (int i = 0; i < rows; i++) {
        int finalI = i;
        task = () -> {
            for (int j = 0; j < columnsB; j++) {
                //  returnMatrix[finalI][j] = 0;
                for (int k = 0; k < columnsA; k++) {
                    returnMatrix[finalI][j] +=
                            matrixA[finalI][k] * matrixB[k][j];
                }
            }
        };
        pool.add(new Thread(task));
    }
    if (parallel) {
        for (Thread trd : pool) {
            trd.start();
        }
    } else {
        for (Thread trd : pool) {
            trd.run();
        }
    }
    try {
        for (Thread trd : pool) {
            trd.join();
        }
    } catch (
            Exception e) {
        e.printStackTrace();
    }
    return returnMatrix;
}

【问题讨论】：

标签： java multithreading matrix matrix-multiplication java-threads

【解决方案1】：

您可以使用一个parallel 流 来减少计算时间（可能是两倍或更多）。不要使用嵌套并行，因为这会产生相反的效果！

/**
 * Parallel Matrix multiplication
 *
 * @param m rows of 'a' matrix
 * @param n columns of 'a' matrix
 *          and rows of 'b' matrix
 * @param p columns of 'b' matrix
 * @param a first matrix 'm×n'
 * @param b second matrix 'n×p'
 * @return result matrix 'm×p'
 */
static double[][] parallelMatrixMultiplication(
        int m, int n, int p, double[][] a, double[][] b) {
    return IntStream.range(0, m)
            .parallel() // comment this line to check the sequential stream
            .mapToObj(i -> IntStream.range(0, p)
                    .mapToDouble(j -> IntStream.range(0, n)
                            .mapToDouble(k -> a[i][k] * b[k][j])
                            .sum())
                    .toArray())
            .toArray(double[][]::new);
}

// test
public static void main(String[] args) {
    // dimensions
    int m = 512;
    int n = 1024;
    int p = 512;

    // matrices
    double[][] a = randomMatrix(m, n);
    double[][] b = randomMatrix(n, p);

    long time = System.currentTimeMillis();

    // multiplication
    double[][] c = parallelMatrixMultiplication(m, n, p, a, b);

    System.out.println(System.currentTimeMillis() - time);
    // with    .parallel() the time is - 1495
    // without .parallel() the time is - 5823
}

static double[][] randomMatrix(int d1, int d2) {
    return IntStream.range(0, d1)
            .mapToObj(i -> IntStream.range(0, d2)
                    .mapToDouble(j -> Math.random() * 10)
                    .toArray())
            .toArray(double[][]::new);
}

【讨论】：

【解决方案2】：

根本没有什么问题。

与几次乘法相比，创建线程意味着巨大的开销。目前，对于 512*512 矩阵，您创建 512 个线程。你的 CPU 肯定有少于 512 个内核，所以只有例如其中 8 或 16 个将真正在不同的内核上并行运行，但另外约 500 个也消耗了创建开销，而不会增加并行执行。

尝试将线程数限制为更接近 CPU 内核数，可以使用您自己的逻辑，也可以使用框架，例如java.util.concurrent 包。

【讨论】：