ScaLAPACK and solving banded matrices in parallel in C

ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Introduction to Linux - A Hands on Guide

This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.

Click Here to receive this Complete Guide absolutely free.

ScaLAPACK and solving banded matrices in parallel in C

Hello folks, in a previous thread located here, I addressed some of the algorithmic issues I ran into when trying to solve a banded systems of equations using the sequential banded solvers in LAPACK, from a C code.

My issue now has "scaled" I am interested in achieving an scalable solution of a big banded systems using the parallel solver in ScaLAPACK.

My first 2 questions, which hopefully will serve the purpose of initiating this thread are:

Has anybody tried to solve a system using ScaLAPACK from a C code?

ScaLAPACK comes with a driver example written in Fortran, which DOES NOT scale per se... which important factor could be interplaying with this performance issue? Which info could I provide that would help clarifying this question?

Thanks in advanced guys!

PS: As it is hopefully assumed, I have made extensive research on both the user manuals, the LAPACK/ScaLAPACK user forums, and in other websites... BUT it baffles me how I can't seem to find any working C-code driver for this problem

ScaLAPACK may be used with two libraries for distributed computations -- MPI and PVM. Which one do you prefer? I'm currently trying to figure out how to use MPI.

Please post your best attempt here, even if it's not working, so that we have something to start with.
BTW, which example written in fortran did you mean?

Hello firstfire! Thank you very much for the motivating tone of your reply! I agree: I should have uploaded something even if it is not working, so I will do it right away!

I want to use the MPI-based API. But first: here's the Fortran driver I talked about:

When downloading Scalapack, the file is included in the /TESTING/LIN folder.

Here's my REALLY FLAWED partial solution to the code:

Code:

/*!
Convention for codes in where parallel problem solving is performed by several
working entities.
This code, as all of my codes in where a solution to a task is performed in
parallel by several working entities, such as threads or processes, is
follows the following convention for naming the variables:
I imagine that these working entities are living entities in a world, which
represents the collection of parameters for the communication among these
entities. Therefore, variables can be of three types:
1. Variables that have to do with a PROPERTY "GLOBAL" TO ALL OF THE PROCESSES
begin with the word "the_". In this way, it should be clear that all of the
entities share such a parameter. For example, when solving a linear system
of equations, the entities refer to THE rank of the matrix, which is common
to all of then.
2. Variables that deal with THE CONFIGURATION OF THE WORLD which englobes
the configuration of the communication space which the entities are guided
by. These variables begin with "our_". In this manner, variables think
about the size of THEIR WORLD (communicator's size, for_example), the
number of processes, or the order of THEIR block distribution.
3. Variables that have to do with a property which is PARTICULAR TO THE
ENTITY, say its rank, for example, shall begin its name with "my_", this is
straightforward to understand.
Since we all are the centers of our universes (or at least should be assumed so,
in MY opinion), we will order the variables based on this sense of belonging,
and then based on the actual datatype, as I usually do.
Example:
int my_id; // My process id.
int our_npcol; // Cols in the process grid.
int the_rank; // THE size of the problem.
One of the main reasons of why I do this, is because I like to reserve the
qualifiers "local" and "global" for the scope of operations/methods, as well as
"private" and "public" for access control in object-oriented programs.
Conventions for MPI_Abort codes (particular the each problem):
*/
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include "mpi.h"
#include "cmtk.h"
/*! For the following, see:
http://www.netlib.org/scalapack/slug/node73.html#SECTION04420000000000000000
*/
#define DTYPE_NARROW_BAND_MATRIX 501
#define DTYPE_NARROW_BAND_RHS 502
#define COEFF_PER_PROC 64
#define RHS_ELEM_PER_PROC 8
#define DLENGTH 7
/*! External procedures prototypes from CBLACS: */
extern void Cblacs_pinfo(int *mypnum, int *nprocs);
extern void Cblacs_get(int ConTxt, int what, int *val);
extern void Cblacs_gridinit(int *ConTxt, char *order, int nprow, int npcol);
extern void Cblacs_gridinfo(int ConTxt, int *nprow, int *npcol,
int *myrow, int *mycol);
extern void Cblacs_gridexit(int ConTxt);
extern void Cblacs_exit(int NotDone);
/*! External procedures prototypes from ScaLAPACK: */
extern int numroc_();
extern void pdgbtrf_();
extern void pdgbtrs_();
int main(void) {
/*! Variables. */
int my_id; /* My process id. */
int my_prow; /* My row position in the process grid. */
int my_pcol; /* My column position in the process grid. */
char our_order; /* Row-major or column-major order. */
int our_np; /* Total number of processes. */
int our_context; /* BLACS context. */
int our_rhs_context; /* BLACS context. */
int our_value; /* Used by BLACS when getting the context. */
/* For the following, see:
http://techpubs.sgi.com/library/tpl/cgi-bin/
getdoc.cgi?coll=linux&db=man&fname=/usr/share/catman/man3/cblacs_get.3s.html
*/
const int our_default_system_context = 0;
int our_nprow; /* Rows in the process grid. */
int our_npcol; /* Cols in the process grid. */
int our_nb; /* Num of blocks per row. */
int our_mb; /* Num of blocks per column. */
// double the_core[COEFF_PER_PROC]; /* THE coefficient matrix. */
double my_rhs[RHS_ELEM_PER_PROC]; /* THE right-hand side. */
double *the_af; /* Auxiliary Fillin Space. */
double *the_work; /* DOUBLE PRECISION temp workspace. */
// char the_trans = 'N'; /* Should WE transpose when solving? */
int the_desca[DLENGTH]; /* Descriptor array coefficient matrix. */
int the_descb[DLENGTH]; /* Descriptor array for rhs. */
int the_rank; /* THE size of the problem. */
int the_kl; /* THE number of lower-diagonals. */
int the_ku; /* THE number of lower-diagonals. */
int the_lda; /* THE leading dimension of the matrix. */
int the_ldb; /* THE leading dimension of B. */
int the_nrhs; /* THE number of right-hand sides. */
int the_ja; /* THE offset for A. */
int the_ib; /* THE offset for B. */
int the_laf; /* Length of aux fill space array. */
int the_lwork; /* Length of the temp workspace array. */
// int the_info; /* Evaluation variable for solver. */
int the_ii; /* Iterator. */
/*! Begin execution. */
/*! Start BLACS: */
/* We do not need to call for MPI inits: */
Cblacs_pinfo(&my_id, &our_np);
/* Define process grid: */
our_value = -1;
Cblacs_get(our_value, our_default_system_context, &our_context);
/* Implement 1-d block-column distribution: */
our_order = 'r';
our_nprow = 1;
/* Should I make the number of processes equal to the columns in the pg? */
our_npcol = our_np;
Cblacs_gridinit(&our_context, &our_order, our_nprow, our_npcol);
/* Define transpose process grid for the RHS: */
Cblacs_get(our_value, our_default_system_context, &our_rhs_context);
our_order = 'c';
Cblacs_gridinit(&our_rhs_context, &our_order, our_npcol, our_nprow);
Cblacs_gridinfo(our_context, &our_nprow, &our_npcol, &my_prow, &my_pcol);
if (my_id == 0) {
fprintf(stdout, "I am p %d of %d in (%d, %d)\n", my_id, our_np,
my_prow, my_pcol);
fprintf(stdout, "Disposition of processes: %d rows - %d cols.\n",
our_nprow, our_npcol);
}
/*! Set parameters: */
/* Parameters of the problem: */
the_rank = 8;
the_kl = 2;
the_ku = 1;
the_lda = 2*the_kl + 1 + 2*the_ku;
the_ldb = the_lda;
the_nrhs = 1;
the_ja = 1;
the_ib = the_ja;
// our_nb = the_rank/our_np;
// our_mb = the_rank/our_np;
our_nb = numroc_(&the_lda, &the_lda,
&my_prow, 0, &our_nprow);
// our_mb = numroc_(&the_rank, &our_nb, &my_pcol, 0, &our_npcol);
if (my_id == 0) {
fprintf(stdout, "Matrix split in %d blocks, and rhs in %d.\n", our_nb, the_kl);
}
/* OPTION 1: NO TRANSPOSITION OF THE CORE: */
// double my_core[COEFF_PER_PROC] = {
// 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
// 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
// 0.00, 0.00, 41.11, 41.11, 41.11, 41.11, 41.11, 41.11,
// 1.00, -88.89, -88.89, -88.89, -88.89, -88.89, -88.89, -2.70,
// 47.78, 47.78, 47.78, 47.78, 47.78, 47.78, 4.00, 0.00,
// 0.00, 0.00, 0.00, 0.00, 0.00, -1.00, 0.00, 0.00
// };
// /* OPTION 2: TRANSPOSITION OF THE CORE:*/
// double my_core[COEFF_PER_PROC] = {
// 0.00, 0.00, 0.00, 1.00, 47.78, 0.00,
// 0.00, 0.00, 0.00, -88.89, 47.78, 0.00,
// 0.00, 0.00, 41.11, -88.89, 47.78, 0.00,
// 0.00, 0.00, 41.11, -88.89, 47.78, 0.00,
// 0.00, 0.00, 41.11, -88.89, 47.78, 0.00,
// 0.00, 0.00, 41.11, -88.89, 47.78, -1.00,
// 0.00, 0.00, 41.11, -88.89, 4.00, 0.00,
// 0.00, 0.00, 41.11, -2.70, 0.00, 0.00
// };
/* 1-d block-row distribution of the rhs: */
my_rhs[0] = 1.0;
my_rhs[1] = 1.0;
my_rhs[2] = 1.0;
my_rhs[3] = 1.0;
my_rhs[4] = 1.0;
my_rhs[5] = 1.0;
my_rhs[6] = 1.0;
my_rhs[7] = 0.0;
/*! Array descriptor for A: */
/* Please see:
http://www.netlib.org/scalapack/slug/node86.html#SECTION04445000000000000000
*/
the_desca[0] = DTYPE_NARROW_BAND_MATRIX; /* The descriptor type. */
the_desca[1] = our_context; /* The BLACS context handle. */
the_desca[2] = the_rank; /* The numcols in global matrix. */
the_desca[3] = our_nb; /* The column block size. */
the_desca[4] = 0; /* Proc col for 1st col of mtrix. */
the_desca[5] = the_lda; /* Local leading dim of matrix. */
the_desca[6] = 0; /* Unused, reserved. */
/*! Array descriptor for B: */
/* Please see:
http://www.netlib.org/scalapack/slug/node87.html#SECTION04446000000000000000
*/
the_descb[0] = DTYPE_NARROW_BAND_RHS; /* The descriptor type. */
the_descb[1] = our_context; /* The BLACS context handle. */
the_descb[2] = the_rank; /* The numrows in the global vector. */
our_mb = 0;
the_descb[3] = our_mb; /* The row block size. */
the_descb[4] = 0; /* Proc row for first row of vector. */
the_descb[5] = the_ldb; /* Local leading dimension of vector. */
the_descb[6] = 0; /* Unused, reserved. */
/*! Creation of auxiliary arrays for ScaLAPACK: */
the_laf = 12*our_npcol + 3*our_nb;
the_af = (double *) malloc(the_laf*sizeof(double));
if (the_af == NULL) {
MPI_Abort(MPI_COMM_WORLD, 1);
}
the_lwork = 10*our_npcol + 4*the_nrhs;
the_work = (double *) malloc(the_lwork*sizeof(double));
if (the_work == NULL) {
MPI_Abort(MPI_COMM_WORLD, 1);
}
fprintf(stdout, "Attained solution:\n");
for (the_ii = 0; the_ii < the_rank; the_ii++) {
fprintf(stdout, "\t%g\n", my_rhs[the_ii]);
}
fprintf(stdout, "\n");
// /*! Factorization: */
// /* PDGBTRF (N, BWL, BWU, A, JA, DESCA, IPIV, AF, LAF, WORK, LWORK, INFO) */
// pdgbtrf_(&the_rank, &the_kl, &the_ku, my_core, &the_ja, the_desca,
// the_af, &the_laf, the_work, &the_lwork, &the_info);
// if (the_info != 0) {
// if (my_id == 0) {
// fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
// fprintf(stderr, "Exiting...\n");
// }
// MPI_Abort(MPI_COMM_WORLD, 1);
// } else if (my_id == 0) {
// fprintf(stdout, ":) Factorization went 0k! info = %d\n", the_info);
// }
//
// /*! Solution: */
// /* PDGBTRS (TRANS, N, BWL, BWU, NRHS, A, JA, DESCA, IPIV,
// B, IB, DESCB, AF, LAF, WORK, LWORK, INFO)
// */
// pdgbtrs_(&the_trans, &the_rank, &the_kl, &the_ku, &the_nrhs,
// my_core, &the_ja, the_desca,
// my_rhs, &the_ib, the_descb, the_af, &the_laf,
// the_work, &the_lwork, &the_info);
// if (the_info != 0) {
// if (my_id == 0) {
// fprintf(stderr, ":( Factorization went wrong: info = %d\n", the_info);
// fprintf(stderr, "Exiting...\n");
// }
// MPI_Abort(MPI_COMM_WORLD, 1);
// } else if (my_id == 0) {
// fprintf(stdout, ":) Solution went 0k! info = %d\n", the_info);
// }
/*! Finalize BLACS: */
Cblacs_gridexit(our_context);
Cblacs_exit(EXIT_SUCCESS);
return EXIT_SUCCESS;
}

LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.