scatterv 后的分段错误答案

【问题标题】：Segmentation fault after scattervscatterv 后的分段错误
【发布时间】：2012-05-27 07:39:18
【问题描述】：

     /**
     * BLOCK_LOW
     * Returns the offset of a local array
     * with regards to block decomposition
     * of a global array.
     *
     * @param  (int) process rank
     * @param  (int) total number of processes
     * @param  (int) size of global array
     * @return (int) offset of local array in global array
     */
    #define BLOCK_LOW(id, p, n) ((id)*(n)/(p))

    /**
     * BLOCK_HIGH
     * Returns the index immediately after the
     * end of a local array with regards to
     * block decomposition of a global array.
     *
     * @param  (int) process rank
     * @param  (int) total number of processes
     * @param  (int) size of global array
     * @return (int) offset after end of local array
     */
    #define BLOCK_HIGH(id, p, n) (BLOCK_LOW((id)+1, (p), (n)))

    /**
     * BLOCK_SIZE
     * Returns the size of a local array
     * with regards to block decomposition
     * of a global array.
     *
     * @param  (int) process rank
     * @param  (int) total number of processes
     * @param  (int) size of global array
     * @return (int) size of local array
     */
    #define BLOCK_SIZE(id, p, n) ((BLOCK_HIGH((id), (p), (n))) - (BLOCK_LOW((id), (p), (n))))

    /**
     * BLOCK_OWNER
     * Returns the rank of the process that
     * handles a certain local array with
     * regards to block decomposition of a
     * global array.
     *
     * @param  (int) index in global array
     * @param  (int) total number of processes
     * @param  (int) size of global array
     * @return (int) rank of process that handles index
     */
    #define BLOCK_OWNER(i, p, n) (((p)*((i)+1)-1)/(n))



    /*Matricefilenames:
      small matrix A.bin of dimension 100 × 50
      small matrix B.bin of dimension 50 × 100
      large matrix A.bin of dimension 1000 × 500
      large matrix B.bin of dimension 500 × 1000

    An MPI program should be implemented such that it can
    • accept two file names at run-time,
    • let process 0 read the A and B matrices from the two data files,
    • let process 0 distribute the pieces of A and B to all the other processes,
    • involve all the processes to carry out the the chosen parallel algorithm
    for matrix multiplication C = A * B ,
    • let process 0 gather, from all the other processes, the different pieces
    of C ,
    • let process 0 write out the entire C matrix to a data file.
    */


    #include <stdio.h>
    #include <stdlib.h>
    #include <mpi.h>
    #include "mpi-utils.c"
    void read_matrix_binaryformat (char*, double***, int*, int*);
    void write_matrix_binaryformat (char*, double**, int, int);
    void create_matrix (double***,int,int);
    void matrix_multiplication (double ***, double ***, double ***,int,int, int);

    int main(int argc, char *argv[]) {
        int id,p; // Process rank and total amount of processes
        int rowsA, colsA, rowsB, colsB; // Matrix dimensions
        double **A; // Matrix A
        double **B; // Matrix B
        double **C; // Result matrix C : AB
        int local_rows; // Local row dimension of the matrix A
        double **local_A; // The local A matrix
        double **local_C;  // The local C matrix

        MPI_Init (&argc, &argv);
        MPI_Comm_rank (MPI_COMM_WORLD, &id);
        MPI_Comm_size (MPI_COMM_WORLD, &p);

        if(argc != 3) {
            if(id == 0) {
                printf("Usage:\n>> %s matrix_A matrix_B\n",argv[0]);
            }       
            MPI_Finalize();
            exit(1);
        }

        if (id == 0) {
            read_matrix_binaryformat (argv[1], &A, &rowsA, &colsA);
            read_matrix_binaryformat (argv[2], &B, &rowsB, &colsB);
        }

        if (p == 1) {
            create_matrix(&C,rowsA,colsB);
            matrix_multiplication (&A,&B,&C,rowsA,colsB,colsA);

            char* filename = "matrix_C.bin";
            write_matrix_binaryformat (filename, C, rowsA, colsB);
            free(A);
            free(B);
            free(C);
            MPI_Finalize();
            return 0;
        }


        // For this assignment we have chosen to bcast the whole matrix B:
        MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); 
        MPI_Bcast (&colsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
        MPI_Bcast (&colsB, 1, MPI_INT, 0, MPI_COMM_WORLD);
        MPI_Bcast (&rowsA, 1, MPI_INT, 0, MPI_COMM_WORLD);
        MPI_Bcast (&rowsB, 1, MPI_INT, 0, MPI_COMM_WORLD);

        local_rows = BLOCK_SIZE(id, p, rowsA);


        /*    SCATTER VALUES    */

        int *proc_elements = (int*)malloc(p*sizeof(int)); // amount of elements for each processor
        int *displace = (int*)malloc(p*sizeof(int)); // displacement of elements for each processor
        int i;
        for (i = 0; i<p; i++) {
            proc_elements[i] = BLOCK_SIZE(i, p, rowsA)*colsA;
            displace[i] = BLOCK_LOW(i, p, rowsA)*colsA;
        }

        create_matrix(&local_A,local_rows,colsA);

        MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
                     local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);

        /*    END  SCATTER  VALUES  */  

        create_matrix (&local_C,local_rows,colsB);
        matrix_multiplication (&local_A,&B,&local_C,local_rows,colsB,colsA);

        /*    GATHER VALUES    */

        MPI_Gatherv(&local_C[0], rowsA*colsB, MPI_DOUBLE,&C[0],
              &proc_elements[0],&displace[0],MPI_DOUBLE,0, MPI_COMM_WORLD);

        /*    END  GATHER VALUES  */

        char* filename = "matrix_C.bin";
        write_matrix_binaryformat (filename, C, rowsA, colsB);  

        free (proc_elements);
        free (displace);    
        free (local_A);
        free (local_C);
        free (A);
        free (B);
        free (C);   
        MPI_Finalize ();
        return 0;
    }

    void create_matrix (double ***C,int rows,int cols) {
        *C = (double**)malloc(rows*sizeof(double*));
        (*C)[0] = (double*)malloc(rows*cols*sizeof(double));
        int i;
        for (i=1; i<rows; i++)
            (*C)[i] = (*C)[i-1] + cols;
    }

    void matrix_multiplication (double ***A, double ***B, double ***C, int rowsC,int colsC,int colsA) {
        double sum;
        int i,j,k;
        for (i = 0; i < rowsC; i++) {
            for (j = 0; j < colsC; j++) {
                sum = 0.0;
                for (k = 0; k < colsA; k++) {
                    sum = sum + (*A)[i][k]*(*B)[k][j];
                }
                (*C)[i][j] = sum;
            }
        }
    }

    /* Reads a 2D array from a binary file*/ 
    void read_matrix_binaryformat (char* filename, double*** matrix, int* num_rows, int* num_cols) {
        int i;
        FILE* fp = fopen (filename,"rb");
        fread (num_rows, sizeof(int), 1, fp);
        fread (num_cols, sizeof(int), 1, fp);
        /* storage allocation of the matrix */
        *matrix = (double**)malloc((*num_rows)*sizeof(double*));
        (*matrix)[0] = (double*)malloc((*num_rows)*(*num_cols)*sizeof(double));
        for (i=1; i<(*num_rows); i++)
            (*matrix)[i] = (*matrix)[i-1]+(*num_cols);
        /* read in the entire matrix */
        fread ((*matrix)[0], sizeof(double), (*num_rows)*(*num_cols), fp);
        fclose (fp);
    }

    /* Writes a 2D array in a binary file */
    void write_matrix_binaryformat (char* filename, double** matrix, int num_rows, int num_cols) {
      FILE *fp = fopen (filename,"wb");
      fwrite (&num_rows, sizeof(int), 1, fp);
      fwrite (&num_cols, sizeof(int), 1, fp);
      fwrite (matrix[0], sizeof(double), num_rows*num_cols, fp);
      fclose (fp);
    }

我的任务是对矩阵 A 和 B 进行并行矩阵乘法，并将结果收集到矩阵 C 中。

我通过将矩阵 A 划分为行段来做到这一点，每个进程将使用它的部分来乘以矩阵 B，并从乘法中取回它的部分。然后我将收集所有流程中的部分，并将它们组合到矩阵 C 中。

我已经发布了一个类似的问题，但是这段代码得到了改进并且我已经取得了进步，但是在 scatterv 调用之后我仍然遇到分段错误。

【问题讨论】：

请使用调试器来缩小发生故障的确切位置。 SO 不是您的个人调试器。
我已经更改了从文件中读取矩阵的代码，代码只为它们分配空间，留下随机值。运行程序……Program exited normally. (gdb) quit……我们甚至无法重现错误……关于那个代码，你确定你的数据被正确解码了吗（代码忽略了如何处理数据的特定字节序，即使数据大小正确...例如sizeof(int)不需要与生成矩阵二进制文件的机器相同...

标签： c segmentation-fault mpi matrix-multiplication scatter

【解决方案1】：

所以我马上就发现了一些问题：

    MPI_Bcast (&B, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);

在这里，您传递的不是指向 double 的指针，而是指向指向 double 的指针的指针（B 定义为 double **B），并且您告诉 MPI 跟随该指针并发送 1 个 double从那里。那是行不通的。

您可能认为您在这里完成的是发送指向矩阵的指针，所有任务都可以从中读取数组——这是行不通的。这些进程不共享公共内存空间（这就是 MPI 被称为分布式内存编程的原因），并且指针不会去任何地方。你实际上将不得不发送矩阵的内容，

    MPI_Bcast (&(B[0][0]), rowsB*colsB, MPI_DOUBLE, 0, MPI_COMM_WORLD);

您必须确保其他进程提前为 B 矩阵正确分配了内存。

其他地方也有类似的指针问题：

    MPI_Scatterv(&A[0], ..., &local_A[0]

同样，A 是指向双精度 (double **A) 的指针，就像 local_A 一样，您需要将 MPI 指向双精度指针才能使其工作，类似于

    MPI_Scatterv(&(A[0][0]), ..., &(local_A[0][0])

该错误似乎存在于所有通信例程中。

请记住，在 MPI 中任何看起来像 (buffer, count, TYPE) 的东西都意味着 MPI 例程跟随指针 buffer 并在那里发送下一个 count 类型的 TYPE 数据片段。 MPI 无法跟踪您发送的缓冲区中的指针，因为通常它不知道它们在那里。它只从指针buffer 中获取下一个(count * sizeof(TYPE)) 字节，并与它们进行适当的通信。所以你必须给它传递一个指向 TYPE 类型数据流的指针。

说了这么多，如果你能把事情缩小一点，在这方面与你合作会容易得多；现在您发布的程序包含许多无关紧要的 I/O 内容，这意味着没有人可以在不先弄清楚矩阵格式然后自己生成两个矩阵的情况下运行您的程序来查看会发生什么。在发布有关源代码的问题时，您确实想发布 (a) 一小段源代码，它 (b) 重现了问题，并且 (c) 完全独立。

【讨论】：

【解决方案2】：

请考虑这是一个扩展评论，因为 Jonathan Dursi 已经给出了相当详尽的答案。您的矩阵确实以一种奇怪的方式表示，但至少您遵循了针对其他问题的建议，并将它们作为连续块分配空间，而不是为每一行单独分配空间。

鉴于此，您应该替换：

MPI_Scatterv(&A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,&local_A[0],
             local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);

与

MPI_Scatterv(A[0],&proc_elements[0],&displace[0],MPI_DOUBLE,local_A[0],
             local_rows*colsA,MPI_DOUBLE,0,MPI_COMM_WORLD);

A[0]已经指向了矩阵数据的开头，不需要再做指针了。 local_A[0] 以及 MPI_Gatherv() 调用的参数也是如此。

已经说过很多次了 - MPI 不进行指针追踪，只适用于平面缓冲区。

我还注意到您的代码中的另一个错误 - 未正确释放矩阵的内存。您只是在释放指针数组，而不是矩阵数据本身：

free(A);

真的应该变成

free(A[0]); free(A);

【讨论】：