LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-21-2013, 07:52 PM   #1
ejspeiro
Member
 
Registered: Feb 2011
Location: San Diego, California (92115), U.S.A.
Distribution: Ubuntu 14.04 LTS (Trusty Tahr)
Posts: 201

Rep: Reputation: 26
ScaLAPACK and solving banded matrices in parallel in C


Hello folks, in a previous thread located here, I addressed some of the algorithmic issues I ran into when trying to solve a banded systems of equations using the sequential banded solvers in LAPACK, from a C code.

My issue now has "scaled" I am interested in achieving an scalable solution of a big banded systems using the parallel solver in ScaLAPACK.

My first 2 questions, which hopefully will serve the purpose of initiating this thread are:
  1. Has anybody tried to solve a system using ScaLAPACK from a C code?
  2. ScaLAPACK comes with a driver example written in Fortran, which DOES NOT scale per se... which important factor could be interplaying with this performance issue? Which info could I provide that would help clarifying this question?
Thanks in advanced guys!

PS: As it is hopefully assumed, I have made extensive research on both the user manuals, the LAPACK/ScaLAPACK user forums, and in other websites... BUT it baffles me how I can't seem to find any working C-code driver for this problem

\m/
 
Old 02-25-2013, 09:12 AM   #2
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 638

Rep: Reputation: 373Reputation: 373Reputation: 373Reputation: 373
Hi.

ScaLAPACK may be used with two libraries for distributed computations -- MPI and PVM. Which one do you prefer? I'm currently trying to figure out how to use MPI.

Please post your best attempt here, even if it's not working, so that we have something to start with.
BTW, which example written in fortran did you mean?
 
Old 02-25-2013, 09:06 PM   #3
ejspeiro
Member
 
Registered: Feb 2011
Location: San Diego, California (92115), U.S.A.
Distribution: Ubuntu 14.04 LTS (Trusty Tahr)
Posts: 201

Original Poster
Rep: Reputation: 26
Hello firstfire! Thank you very much for the motivating tone of your reply! I agree: I should have uploaded something even if it is not working, so I will do it right away!

I want to use the MPI-based API. But first: here's the Fortran driver I talked about:

http://www.netlib.org/scalapack/expl...8f_source.html

When downloading Scalapack, the file is included in the /TESTING/LIN folder.

Here's my REALLY FLAWED partial solution to the code:

Code:
/*!
Convention for codes in where parallel problem solving is performed by several
working entities.

This code, as all of my codes in where a solution to a task is performed in
parallel by several working entities, such as threads or processes, is
follows the following convention for naming the variables:

I imagine that these working entities are living entities in a world, which
represents the collection of parameters for the communication among these
entities. Therefore, variables can be of three types:
  1. Variables that have to do with a PROPERTY "GLOBAL" TO ALL OF THE PROCESSES
     begin with the word "the_". In this way, it should be clear that all of the
     entities share such a parameter. For example, when solving a linear system
     of equations, the entities refer to THE rank of the matrix, which is common
     to all of then.
  2. Variables that deal with THE CONFIGURATION OF THE WORLD which englobes
     the configuration of the communication space which the entities are guided
     by. These variables begin with "our_". In this manner, variables think
     about the size of THEIR WORLD (communicator's size, for_example), the
     number of processes, or the order of THEIR block distribution.
  3. Variables that have to do with a property which is PARTICULAR TO THE
     ENTITY, say its rank, for example, shall begin its name with "my_", this is
     straightforward to understand.

Since we all are the centers of our universes (or at least should be assumed so,
in MY opinion), we will order the variables based on this sense of belonging,
and then based on the actual datatype, as I usually do.
     
Example:

int my_id;        // My process id.
int our_npcol;    // Cols in the process grid.
int the_rank;     // THE size of the problem.

One of the main reasons of why I do this, is because I like to reserve the
qualifiers "local" and "global" for the scope of operations/methods, as well as
"private" and "public" for access control in object-oriented programs.

Conventions for MPI_Abort codes (particular the each problem):


*/
#include <stdlib.h>
#include <stdio.h>
#include <math.h>

#include "mpi.h"
#include "cmtk.h"

/*! For the following, see:
  http://www.netlib.org/scalapack/slug/node73.html#SECTION04420000000000000000
*/
#define DTYPE_NARROW_BAND_MATRIX  501
#define DTYPE_NARROW_BAND_RHS     502

#define COEFF_PER_PROC            64
#define RHS_ELEM_PER_PROC         8

#define DLENGTH                   7

/*! External procedures prototypes from CBLACS: */
extern void Cblacs_pinfo(int *mypnum, int *nprocs);
extern void Cblacs_get(int ConTxt, int what, int *val);
extern void Cblacs_gridinit(int *ConTxt, char *order, int nprow, int npcol);
extern void Cblacs_gridinfo(int ConTxt, int *nprow, int *npcol,
                            int *myrow, int *mycol);
extern void Cblacs_gridexit(int ConTxt);
extern void Cblacs_exit(int NotDone);

/*! External procedures prototypes from ScaLAPACK: */
extern int numroc_();
extern void pdgbtrf_();
extern void pdgbtrs_();

int main(void) {

  /*! Variables. */
  int my_id;            /* My process id. */
  int my_prow;          /* My row position in the process grid. */
  int my_pcol;          /* My column position in the process grid. */
  
  char our_order;       /* Row-major or column-major order. */
  int our_np;           /* Total number of processes. */
  int our_context;      /* BLACS context. */
  int our_rhs_context;  /* BLACS context. */
  int our_value;        /* Used by BLACS when getting the context. */
  /* For the following, see:
   http://techpubs.sgi.com/library/tpl/cgi-bin/
   getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/cblacs_get.3s.html
  */
  const int our_default_system_context = 0;
  int our_nprow;    /* Rows in the process grid. */
  int our_npcol;    /* Cols in the process grid. */
  int our_nb;       /* Num of blocks per row. */
  int our_mb;       /* Num of blocks per column. */

//   double the_core[COEFF_PER_PROC];    /* THE coefficient matrix. */
  double my_rhs[RHS_ELEM_PER_PROC];   /* THE right-hand side. */
  double *the_af;                     /* Auxiliary Fillin Space. */
  double *the_work;                   /* DOUBLE PRECISION temp workspace. */
//   char the_trans = 'N';               /* Should WE transpose when solving? */
  int the_desca[DLENGTH];             /* Descriptor array coefficient matrix. */
  int the_descb[DLENGTH];             /* Descriptor array for rhs. */
  int the_rank;                       /* THE size of the problem. */
  int the_kl;                         /* THE number of lower-diagonals. */
  int the_ku;                         /* THE number of lower-diagonals. */
  int the_lda;                        /* THE leading dimension of the matrix. */
  int the_ldb;                        /* THE leading dimension of B. */
  int the_nrhs;                       /* THE number of right-hand sides. */
  int the_ja;                         /* THE offset for A. */
  int the_ib;                         /* THE offset for B. */
  int the_laf;                        /* Length of aux fill space array. */
  int the_lwork;                      /* Length of the temp workspace array. */
//   int the_info;                       /* Evaluation variable for solver. */
  int the_ii;                         /* Iterator. */

  /*! Begin execution. */
  /*! Start BLACS: */
  /* We do not need to call for MPI inits: */
  Cblacs_pinfo(&my_id, &our_np);

  /* Define process grid: */
  our_value = -1;
  Cblacs_get(our_value, our_default_system_context, &our_context);
  /* Implement 1-d block-column distribution: */
  our_order = 'r';
  our_nprow = 1;
  /* Should I make the number of processes equal to the columns in the pg? */
  our_npcol = our_np;
  Cblacs_gridinit(&our_context, &our_order, our_nprow, our_npcol);

  /* Define transpose process grid for the RHS: */
  Cblacs_get(our_value, our_default_system_context, &our_rhs_context);
  our_order = 'c';
  Cblacs_gridinit(&our_rhs_context, &our_order, our_npcol, our_nprow);

  Cblacs_gridinfo(our_context, &our_nprow, &our_npcol, &my_prow, &my_pcol);

  if (my_id == 0) {
    fprintf(stdout, "I am p %d of %d in (%d, %d)\n", my_id, our_np,
            my_prow, my_pcol);
    fprintf(stdout, "Disposition of processes: %d rows - %d cols.\n",
            our_nprow, our_npcol);
  }

  /*! Set parameters: */
  /* Parameters of the problem: */
  the_rank = 8;
  the_kl = 2;     
  the_ku = 1;     
  the_lda = 2*the_kl + 1 + 2*the_ku;
  the_ldb = the_lda;
  the_nrhs = 1;
  the_ja = 1;
  the_ib = the_ja;

//   our_nb = the_rank/our_np;
//   our_mb = the_rank/our_np;
  our_nb = numroc_(&the_lda, &the_lda,
                   &my_prow, 0, &our_nprow);
//   our_mb = numroc_(&the_rank, &our_nb, &my_pcol, 0, &our_npcol);

  if (my_id == 0) {
    fprintf(stdout, "Matrix split in %d blocks, and rhs in %d.\n", our_nb, the_kl);
  }

  /* OPTION 1: NO TRANSPOSITION OF THE CORE: */
//   double my_core[COEFF_PER_PROC] = {
//     0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,
//     0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,    0.00,
//     0.00,    0.00,   41.11,   41.11,   41.11,   41.11,   41.11,   41.11,
//     1.00,  -88.89,  -88.89,  -88.89,  -88.89,  -88.89,  -88.89,   -2.70,
//     47.78,   47.78,   47.78,   47.78,   47.78,   47.78,    4.00,    0.00,
//     0.00,    0.00,    0.00,    0.00,    0.00,   -1.00,    0.00,    0.00
//   };

//   /* OPTION 2: TRANSPOSITION OF THE CORE:*/
//   double my_core[COEFF_PER_PROC] = {
//     0.00,    0.00,    0.00,    1.00,   47.78,    0.00,
//     0.00,    0.00,    0.00,  -88.89,   47.78,    0.00,
//     0.00,    0.00,   41.11,  -88.89,   47.78,    0.00,
//     0.00,    0.00,   41.11,  -88.89,   47.78,    0.00,
//     0.00,    0.00,   41.11,  -88.89,   47.78,    0.00,
//     0.00,    0.00,   41.11,  -88.89,   47.78,   -1.00,
//     0.00,    0.00,   41.11,  -88.89,    4.00,    0.00,
//     0.00,    0.00,   41.11,   -2.70,    0.00,    0.00
//   };
  
  /* 1-d block-row distribution of the rhs: */

  my_rhs[0] = 1.0;
  my_rhs[1] = 1.0;
  my_rhs[2] = 1.0;
  my_rhs[3] = 1.0;
  my_rhs[4] = 1.0;
  my_rhs[5] = 1.0;
  my_rhs[6] = 1.0;
  my_rhs[7] = 0.0;

  /*! Array descriptor for A: */
  /* Please see:
  http://www.netlib.org/scalapack/slug/node86.html#SECTION04445000000000000000
  */
  the_desca[0] = DTYPE_NARROW_BAND_MATRIX;  /* The descriptor type. */
  the_desca[1] = our_context;               /* The BLACS context handle. */
  the_desca[2] = the_rank;                  /* The numcols in global matrix. */
  the_desca[3] = our_nb;                    /* The column block size. */
  the_desca[4] = 0;                         /* Proc col for 1st col of mtrix. */
  the_desca[5] = the_lda;                   /* Local leading dim of matrix. */
  the_desca[6] = 0;                         /* Unused, reserved. */
  
  /*! Array descriptor for B: */
  /* Please see:
  http://www.netlib.org/scalapack/slug/node87.html#SECTION04446000000000000000
  */
  the_descb[0] = DTYPE_NARROW_BAND_RHS; /* The descriptor type. */
  the_descb[1] = our_context;           /* The BLACS context handle. */
  the_descb[2] = the_rank;              /* The numrows in the global vector. */
  our_mb = 0;
  the_descb[3] = our_mb;                /* The row block size. */
  the_descb[4] = 0;                     /* Proc row for first row of vector. */
  the_descb[5] = the_ldb;               /* Local leading dimension of vector. */
  the_descb[6] = 0;                     /* Unused, reserved. */

  /*! Creation of auxiliary arrays for ScaLAPACK: */
  the_laf = 12*our_npcol + 3*our_nb;
  the_af = (double *) malloc(the_laf*sizeof(double));
  if (the_af == NULL) {
    MPI_Abort(MPI_COMM_WORLD, 1);
  }
  
  the_lwork = 10*our_npcol + 4*the_nrhs;
  the_work = (double *) malloc(the_lwork*sizeof(double));
  if (the_work == NULL) {
    MPI_Abort(MPI_COMM_WORLD, 1);
  }

  fprintf(stdout, "Attained solution:\n");
  for (the_ii = 0; the_ii < the_rank; the_ii++) {
    fprintf(stdout, "\t%g\n", my_rhs[the_ii]);
  }
  fprintf(stdout, "\n");
  
//   /*! Factorization: */
//   /* PDGBTRF (N, BWL, BWU, A, JA, DESCA, IPIV, AF, LAF, WORK, LWORK, INFO) */
//   pdgbtrf_(&the_rank, &the_kl, &the_ku, my_core, &the_ja, the_desca,
//            the_af, &the_laf, the_work, &the_lwork, &the_info);
//   if (the_info != 0) {
//     if (my_id == 0) {
//       fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
//       fprintf(stderr, "Exiting...\n");
//     }
//     MPI_Abort(MPI_COMM_WORLD, 1);
//   } else if (my_id == 0) {
//     fprintf(stdout, ":) Factorization went 0k! info = %d\n", the_info);
//   }
// 
//   /*! Solution: */
//   /* PDGBTRS (TRANS, N, BWL, BWU, NRHS, A, JA, DESCA, IPIV,
//      B, IB, DESCB, AF, LAF, WORK, LWORK, INFO)
//   */
//   pdgbtrs_(&the_trans, &the_rank, &the_kl, &the_ku, &the_nrhs,
//            my_core, &the_ja, the_desca,
//            my_rhs, &the_ib, the_descb, the_af, &the_laf,
//            the_work, &the_lwork, &the_info);
//   if (the_info != 0) {
//     if (my_id == 0) {
//       fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
//       fprintf(stderr, "Exiting...\n");
//     }
//     MPI_Abort(MPI_COMM_WORLD, 1);
//   } else if (my_id == 0) {
//     fprintf(stdout, ":) Solution went 0k! info = %d\n", the_info);
//   }

  
  /*! Finalize BLACS: */
  Cblacs_gridexit(our_context);
  Cblacs_exit(EXIT_SUCCESS);
  return EXIT_SUCCESS;
}
Thanks in advanced!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Solving a banded system of equations from C using LAPACK's DGBSV ejspeiro Programming 12 02-21-2013 01:59 AM
[SOLVED] Fun with character matrices danielbmartin Programming 15 10-27-2012 02:37 AM
[SOLVED] Block Matrices in C ejspeiro Programming 7 02-13-2012 05:37 PM
blacs scalapack catrina Linux - Newbie 5 01-03-2009 12:05 PM
blacs scalapack catrina Linux - Software 1 01-01-2009 01:21 PM


All times are GMT -5. The time now is 03:15 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration