LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   ScaLAPACK and solving banded matrices in parallel in C (http://www.linuxquestions.org/questions/programming-9/scalapack-and-solving-banded-matrices-in-parallel-in-c-4175451200/)

ejspeiro 02-21-2013 07:52 PM

ScaLAPACK and solving banded matrices in parallel in C
 
Hello folks, in a previous thread located here, I addressed some of the algorithmic issues I ran into when trying to solve a banded systems of equations using the sequential banded solvers in LAPACK, from a C code.

My issue now has "scaled" :D I am interested in achieving an scalable solution of a big banded systems using the parallel solver in ScaLAPACK.

My first 2 questions, which hopefully will serve the purpose of initiating this thread are:
  1. Has anybody tried to solve a system using ScaLAPACK from a C code?
  2. ScaLAPACK comes with a driver example written in Fortran, which DOES NOT scale per se... which important factor could be interplaying with this performance issue? Which info could I provide that would help clarifying this question?
Thanks in advanced guys!

PS: As it is hopefully assumed, I have made extensive research on both the user manuals, the LAPACK/ScaLAPACK user forums, and in other websites... BUT it baffles me how I can't seem to find any working C-code driver for this problem :(

\m/

firstfire 02-25-2013 09:12 AM

Hi.

ScaLAPACK may be used with two libraries for distributed computations -- MPI and PVM. Which one do you prefer? I'm currently trying to figure out how to use MPI.

Please post your best attempt here, even if it's not working, so that we have something to start with.
BTW, which example written in fortran did you mean?

ejspeiro 02-25-2013 09:06 PM

Hello firstfire! Thank you very much for the motivating tone of your reply! I agree: I should have uploaded something even if it is not working, so I will do it right away!

I want to use the MPI-based API. But first: here's the Fortran driver I talked about:

http://www.netlib.org/scalapack/expl...8f_source.html

When downloading Scalapack, the file is included in the /TESTING/LIN folder.

Here's my REALLY FLAWED partial solution to the code:

Code:

/*!
Convention for codes in where parallel problem solving is performed by several
working entities.

This code, as all of my codes in where a solution to a task is performed in
parallel by several working entities, such as threads or processes, is
follows the following convention for naming the variables:

I imagine that these working entities are living entities in a world, which
represents the collection of parameters for the communication among these
entities. Therefore, variables can be of three types:
  1. Variables that have to do with a PROPERTY "GLOBAL" TO ALL OF THE PROCESSES
    begin with the word "the_". In this way, it should be clear that all of the
    entities share such a parameter. For example, when solving a linear system
    of equations, the entities refer to THE rank of the matrix, which is common
    to all of then.
  2. Variables that deal with THE CONFIGURATION OF THE WORLD which englobes
    the configuration of the communication space which the entities are guided
    by. These variables begin with "our_". In this manner, variables think
    about the size of THEIR WORLD (communicator's size, for_example), the
    number of processes, or the order of THEIR block distribution.
  3. Variables that have to do with a property which is PARTICULAR TO THE
    ENTITY, say its rank, for example, shall begin its name with "my_", this is
    straightforward to understand.

Since we all are the centers of our universes (or at least should be assumed so,
in MY opinion), we will order the variables based on this sense of belonging,
and then based on the actual datatype, as I usually do.
   
Example:

int my_id;        // My process id.
int our_npcol;    // Cols in the process grid.
int the_rank;    // THE size of the problem.

One of the main reasons of why I do this, is because I like to reserve the
qualifiers "local" and "global" for the scope of operations/methods, as well as
"private" and "public" for access control in object-oriented programs.

Conventions for MPI_Abort codes (particular the each problem):


*/
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

#include "mpi.h"
#include "cmtk.h"

/*! For the following, see:
  http://www.netlib.org/scalapack/slug/node73.html#SECTION04420000000000000000
*/
#define DTYPE_NARROW_BAND_MATRIX  501
#define DTYPE_NARROW_BAND_RHS    502

#define COEFF_PER_PROC            64
#define RHS_ELEM_PER_PROC        8

#define DLENGTH                  7

/*! External procedures prototypes from CBLACS: */
extern void Cblacs_pinfo(int *mypnum, int *nprocs);
extern void Cblacs_get(int ConTxt, int what, int *val);
extern void Cblacs_gridinit(int *ConTxt, char *order, int nprow, int npcol);
extern void Cblacs_gridinfo(int ConTxt, int *nprow, int *npcol,
                            int *myrow, int *mycol);
extern void Cblacs_gridexit(int ConTxt);
extern void Cblacs_exit(int NotDone);

/*! External procedures prototypes from ScaLAPACK: */
extern int numroc_();
extern void pdgbtrf_();
extern void pdgbtrs_();

int main(void) {

  /*! Variables. */
  int my_id;            /* My process id. */
  int my_prow;          /* My row position in the process grid. */
  int my_pcol;          /* My column position in the process grid. */
 
  char our_order;      /* Row-major or column-major order. */
  int our_np;          /* Total number of processes. */
  int our_context;      /* BLACS context. */
  int our_rhs_context;  /* BLACS context. */
  int our_value;        /* Used by BLACS when getting the context. */
  /* For the following, see:
  http://techpubs.sgi.com/library/tpl/cgi-bin/
  getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/cblacs_get.3s.html
  */
  const int our_default_system_context = 0;
  int our_nprow;    /* Rows in the process grid. */
  int our_npcol;    /* Cols in the process grid. */
  int our_nb;      /* Num of blocks per row. */
  int our_mb;      /* Num of blocks per column. */

//  double the_core[COEFF_PER_PROC];    /* THE coefficient matrix. */
  double my_rhs[RHS_ELEM_PER_PROC];  /* THE right-hand side. */
  double *the_af;                    /* Auxiliary Fillin Space. */
  double *the_work;                  /* DOUBLE PRECISION temp workspace. */
//  char the_trans = 'N';              /* Should WE transpose when solving? */
  int the_desca[DLENGTH];            /* Descriptor array coefficient matrix. */
  int the_descb[DLENGTH];            /* Descriptor array for rhs. */
  int the_rank;                      /* THE size of the problem. */
  int the_kl;                        /* THE number of lower-diagonals. */
  int the_ku;                        /* THE number of lower-diagonals. */
  int the_lda;                        /* THE leading dimension of the matrix. */
  int the_ldb;                        /* THE leading dimension of B. */
  int the_nrhs;                      /* THE number of right-hand sides. */
  int the_ja;                        /* THE offset for A. */
  int the_ib;                        /* THE offset for B. */
  int the_laf;                        /* Length of aux fill space array. */
  int the_lwork;                      /* Length of the temp workspace array. */
//  int the_info;                      /* Evaluation variable for solver. */
  int the_ii;                        /* Iterator. */

  /*! Begin execution. */
  /*! Start BLACS: */
  /* We do not need to call for MPI inits: */
  Cblacs_pinfo(&my_id, &our_np);

  /* Define process grid: */
  our_value = -1;
  Cblacs_get(our_value, our_default_system_context, &our_context);
  /* Implement 1-d block-column distribution: */
  our_order = 'r';
  our_nprow = 1;
  /* Should I make the number of processes equal to the columns in the pg? */
  our_npcol = our_np;
  Cblacs_gridinit(&our_context, &our_order, our_nprow, our_npcol);

  /* Define transpose process grid for the RHS: */
  Cblacs_get(our_value, our_default_system_context, &our_rhs_context);
  our_order = 'c';
  Cblacs_gridinit(&our_rhs_context, &our_order, our_npcol, our_nprow);

  Cblacs_gridinfo(our_context, &our_nprow, &our_npcol, &my_prow, &my_pcol);

  if (my_id == 0) {
    fprintf(stdout, "I am p %d of %d in (%d, %d)\n", my_id, our_np,
            my_prow, my_pcol);
    fprintf(stdout, "Disposition of processes: %d rows - %d cols.\n",
            our_nprow, our_npcol);
  }

  /*! Set parameters: */
  /* Parameters of the problem: */
  the_rank = 8;
  the_kl = 2;   
  the_ku = 1;   
  the_lda = 2*the_kl + 1 + 2*the_ku;
  the_ldb = the_lda;
  the_nrhs = 1;
  the_ja = 1;
  the_ib = the_ja;

//  our_nb = the_rank/our_np;
//  our_mb = the_rank/our_np;
  our_nb = numroc_(&the_lda, &the_lda,
                  &my_prow, 0, &our_nprow);
//  our_mb = numroc_(&the_rank, &our_nb, &my_pcol, 0, &our_npcol);

  if (my_id == 0) {
    fprintf(stdout, "Matrix split in %d blocks, and rhs in %d.\n", our_nb, the_kl);
  }

  /* OPTION 1: NO TRANSPOSITION OF THE CORE: */
//  double my_core[COEFF_PER_PROC] = {
//    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,
//    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,
//    0.00,    0.00,  41.11,  41.11,  41.11,  41.11,  41.11,  41.11,
//    1.00,  -88.89,  -88.89,  -88.89,  -88.89,  -88.89,  -88.89,  -2.70,
//    47.78,  47.78,  47.78,  47.78,  47.78,  47.78,    4.00,    0.00,
//    0.00,    0.00,    0.00,    0.00,    0.00,  -1.00,    0.00,    0.00
//  };

//  /* OPTION 2: TRANSPOSITION OF THE CORE:*/
//  double my_core[COEFF_PER_PROC] = {
//    0.00,    0.00,    0.00,    1.00,  47.78,    0.00,
//    0.00,    0.00,    0.00,  -88.89,  47.78,    0.00,
//    0.00,    0.00,  41.11,  -88.89,  47.78,    0.00,
//    0.00,    0.00,  41.11,  -88.89,  47.78,    0.00,
//    0.00,    0.00,  41.11,  -88.89,  47.78,    0.00,
//    0.00,    0.00,  41.11,  -88.89,  47.78,  -1.00,
//    0.00,    0.00,  41.11,  -88.89,    4.00,    0.00,
//    0.00,    0.00,  41.11,  -2.70,    0.00,    0.00
//  };
 
  /* 1-d block-row distribution of the rhs: */

  my_rhs[0] = 1.0;
  my_rhs[1] = 1.0;
  my_rhs[2] = 1.0;
  my_rhs[3] = 1.0;
  my_rhs[4] = 1.0;
  my_rhs[5] = 1.0;
  my_rhs[6] = 1.0;
  my_rhs[7] = 0.0;

  /*! Array descriptor for A: */
  /* Please see:
  http://www.netlib.org/scalapack/slug/node86.html#SECTION04445000000000000000
  */
  the_desca[0] = DTYPE_NARROW_BAND_MATRIX;  /* The descriptor type. */
  the_desca[1] = our_context;              /* The BLACS context handle. */
  the_desca[2] = the_rank;                  /* The numcols in global matrix. */
  the_desca[3] = our_nb;                    /* The column block size. */
  the_desca[4] = 0;                        /* Proc col for 1st col of mtrix. */
  the_desca[5] = the_lda;                  /* Local leading dim of matrix. */
  the_desca[6] = 0;                        /* Unused, reserved. */
 
  /*! Array descriptor for B: */
  /* Please see:
  http://www.netlib.org/scalapack/slug/node87.html#SECTION04446000000000000000
  */
  the_descb[0] = DTYPE_NARROW_BAND_RHS; /* The descriptor type. */
  the_descb[1] = our_context;          /* The BLACS context handle. */
  the_descb[2] = the_rank;              /* The numrows in the global vector. */
  our_mb = 0;
  the_descb[3] = our_mb;                /* The row block size. */
  the_descb[4] = 0;                    /* Proc row for first row of vector. */
  the_descb[5] = the_ldb;              /* Local leading dimension of vector. */
  the_descb[6] = 0;                    /* Unused, reserved. */

  /*! Creation of auxiliary arrays for ScaLAPACK: */
  the_laf = 12*our_npcol + 3*our_nb;
  the_af = (double *) malloc(the_laf*sizeof(double));
  if (the_af == NULL) {
    MPI_Abort(MPI_COMM_WORLD, 1);
  }
 
  the_lwork = 10*our_npcol + 4*the_nrhs;
  the_work = (double *) malloc(the_lwork*sizeof(double));
  if (the_work == NULL) {
    MPI_Abort(MPI_COMM_WORLD, 1);
  }

  fprintf(stdout, "Attained solution:\n");
  for (the_ii = 0; the_ii < the_rank; the_ii++) {
    fprintf(stdout, "\t%g\n", my_rhs[the_ii]);
  }
  fprintf(stdout, "\n");
 
//  /*! Factorization: */
//  /* PDGBTRF (N, BWL, BWU, A, JA, DESCA, IPIV, AF, LAF, WORK, LWORK, INFO) */
//  pdgbtrf_(&the_rank, &the_kl, &the_ku, my_core, &the_ja, the_desca,
//            the_af, &the_laf, the_work, &the_lwork, &the_info);
//  if (the_info != 0) {
//    if (my_id == 0) {
//      fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
//      fprintf(stderr, "Exiting...\n");
//    }
//    MPI_Abort(MPI_COMM_WORLD, 1);
//  } else if (my_id == 0) {
//    fprintf(stdout, ":) Factorization went 0k! info = %d\n", the_info);
//  }
//
//  /*! Solution: */
//  /* PDGBTRS (TRANS, N, BWL, BWU, NRHS, A, JA, DESCA, IPIV,
//      B, IB, DESCB, AF, LAF, WORK, LWORK, INFO)
//  */
//  pdgbtrs_(&the_trans, &the_rank, &the_kl, &the_ku, &the_nrhs,
//            my_core, &the_ja, the_desca,
//            my_rhs, &the_ib, the_descb, the_af, &the_laf,
//            the_work, &the_lwork, &the_info);
//  if (the_info != 0) {
//    if (my_id == 0) {
//      fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
//      fprintf(stderr, "Exiting...\n");
//    }
//    MPI_Abort(MPI_COMM_WORLD, 1);
//  } else if (my_id == 0) {
//    fprintf(stdout, ":) Solution went 0k! info = %d\n", the_info);
//  }

 
  /*! Finalize BLACS: */
  Cblacs_gridexit(our_context);
  Cblacs_exit(EXIT_SUCCESS);
  return EXIT_SUCCESS;
}

Thanks in advanced!


All times are GMT -5. The time now is 06:32 AM.