[SOLVED] Solving a banded system of equations from C using LAPACK's DGBSV

ejspeiro · 02-14-2013, 11:50 AM

I have tried to search for a similar thread related to this topic, but nobody seems to care for banded systems (...)

I am interested in solving a banded matrix using LAPACK/ScaLAPACK from a C code. First, I want to achieve a sequential solution with LAPACK, before attempting anything with ScaLAPACK.

Problem: The row-major/col-major difference between both languages seems to be affecting my solution process. Here's the system I intend to solve:

Code:

    [    1,     0,     0,     0,    0,    0]     [1]
    [27.50,   -50,  22.5,     0,    0,    0]     [1]
    [    0, 27.50,   -50,  22.5,    0,    0] x = [1]
    [    0,     0, 27.50,   -50, 22.5,    0]     [1]
    [    0,     0,     0, 27.50,  -50, 22.5]     [1]
    [    0,     0,     0,    -1,    4,-2.60]     [0]

The following code, translates that matrix into LAPACK's banded data structure, specified in here.

Code:

      int rr = 6;     // Rank.
      int kl = 2;     // Number of lower diagonals.
      int ku = 1;     // Number of upper diagonals.
      int nrhs = 1;   // Number of RHS.
      double vals[36] = {0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                         0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                       666.0,   0.0,   0.0,   0.0,   0.0,  22.5,  // First up diag.
                         1.0, -50.0, -50.0, -50.0, -50.0,  -2.6,  // Main diagonal.
                        27.5,  27.5,  27.5,  27.5,   4.0, 666.0,  // First low diag.
                         0.0,   0.0,   0.0,  -1.0, 666.0, 666.0}; // 2nd low diag.
                       
      int lda = rr;   // Leading dimension of the matrix.
      int ipiv[6];    // Information on pivoting array.
      double rhs[] = {1.0, 1.0, 1.0, 1.0, 1.0, 0.0};  // RHS.
      int ldb = lda;  // Leading dimension of the RHS.
      int info = 0;   // Evaluation variable for solution process.
      int ii;         // Iterator.
      int jj;         // Iterator.
      
      dgbsv_(&rr, &kl, &ku, &nrhs, vals, &lda, ipiv, rhs, &ldb, &info);
    
      printf("info = %d\n", info);
      for (ii = 0; ii < ldb; ii++) {
        printf("%f\n", rhs[ii]);
      }
      putchar('\n');

As I said, I am worried that the way I am translating my matrix, is incorrect given the col-major nature, as well as the indexing nature of Fortran, since my solution yields:

Code:

    [ejspeiro@node01 lapack-ex02]$ make runs
    `pwd`/blogs < blogs.in
    info = 1
    1.000000
    1.000000
    1.000000
    1.000000
    1.000000
    0.000000

info = 1, implies that factorization was completed, but U(i,i) = U(1,1) was 0.

Any help is more than welcome.

Thanks in advanced!

firstfire · 02-14-2013, 12:30 PM

Hi.

I solved your system of equations using Maxima, and it gives:

Code:

matrix([1.0],[0.99999999999998],[1.000000000000018],[1.0],[0.99999999999999],[0.0])

(ignore Maxima formatting -- it is a column vector)

It is quite close to what you have. I believe you actually have exactly the same solution, which is rounded by printf. Try something like

Code:

printf("%.16f\n", rhs[ii]);

ejspeiro · 02-14-2013, 01:15 PM

Dear firstfire:

I must say, I am surprised... I have solved the system many times, in both MATLAB and Maple. They both agree on this:

Code:

[1.0000000, 1.027346753, 1.105215007, 1.244831762, 1.459918908, 1.767247642]

Furthermore, since the solution corresponds to a steady-state diffusive - advective process with constant reactive term, the actual solution looks like this:

https://dl.dropbox.com/u/5432016/misc/plot.png

Which, agrees with the actual analytical solution of the Partial Differential Equation.

Furthermore, the value info being equal to 1, implies that the factorization was achieved, but a solution couldn't have been performed.

Thank you very much for your reply.

firstfire · 02-15-2013, 12:57 AM

Quote:

Originally Posted by ejspeiro

Dear firstfire:

I must say, I am surprised... I have solved the system many times, in both MATLAB and Maple. They both agree on this:

Code:

[1.0000000, 1.027346753, 1.105215007, 1.244831762, 1.459918908, 1.767247642]

Yes, you are right.. Yesterday I looked at the wrong variable when composing the message above (I accidentally posted the product of the matrix and the solution I found, i.e. r.h.s vector). Maybe the only excuse for me is that I'm ill now and sit at home with high temperature.

I'm sorry.

firstfire · 02-15-2013, 02:13 AM

Hi.

I figured out what is wrong with your code. First, your upper diagonal is entered incorrectly ([0,0,0,0,22.5] instead of [0, 22.5, 22.5, 22.5, 22.5]). Second, you must transpose the matrix before passing it to fortran wrapper routine. Here is the code with my modifications:

Code:

#include <stdio.h>
main()
{
     int rr = 6;     // Rank.
      int kl = 2;     // Number of lower diagonals.
      int ku = 1;     // Number of upper diagonals.
      int nrhs = 1;   // Number of RHS.
      /*
      double vals[36] = {0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                         0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                       666.0,   0.0,   0.0,   0.0,   0.0,  22.5,  // First up diag.
                         1.0, -50.0, -50.0, -50.0, -50.0,  -2.6,  // Main diagonal.
                        27.5,  27.5,  27.5,  27.5,   4.0, 666.0,  // First low diag.
                         0.0,   0.0,   0.0,  -1.0, 666.0, 666.0}; // 2nd low diag.
      */
                       
      double vals[36] = {0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                         0.0,   0.0,   0.0,   0.0,   0.0,   0.0,  // Req. ex. space.
                       666.0,   0.0,   22.5,  22.5,  22.5,  22.5,  // First up diag.
                         1.0, -50.0, -50.0, -50.0, -50.0,  -2.6,  // Main diagonal.
                        27.5,  27.5,  27.5,  27.5,   4.0, 666.0,  // First low diag.
                         0.0,   0.0,   0.0,  -1.0, 666.0, 666.0}; // 2nd low diag.
      int lda = rr;   // Leading dimension of the matrix.
      int ipiv[6];    // Information on pivoting array.
      double rhs[] = {1.0, 1.0, 1.0, 1.0, 1.0, 0.0};  // RHS.
      int ldb = lda;  // Leading dimension of the RHS.
      int info = 0;   // Evaluation variable for solution process.
      int ii;         // Iterator.
      int jj;         // Iterator.
      
      double AB[36];
      // transpose `vals' matrix
      for(ii=0; ii<lda; ii++)
	for(jj=0; jj<lda; jj++)
	  AB[jj+lda*ii] = vals[ii+lda*jj];

      dgbsv_(&rr, &kl, &ku, &nrhs, AB, &lda, ipiv, rhs, &ldb, &info);
    
      printf("info = %d\n", info);
      for (ii = 0; ii < ldb; ii++) {
        printf("%.9f\n", rhs[ii]);
      }
      putchar('\n');
}

Here is the test run:

Code:

$ gcc test.c -llapack && ./a.out 
info = 0
1.000000000
1.027346755
1.105215011
1.244831768
1.459918916
1.767247652

ejspeiro · 02-18-2013, 09:15 PM

Excellent! Yes! I had come into that solution of transposing the "stack of arrays". However, I wonder, how would the transposition occur if the resulting CORE matrix were not to be squared. Take for example:

Code:

Core of the matrix:
   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00 
   0.00    0.00   95.00   95.00   95.00   95.00   95.00   95.00   95.00   95.00   95.00 
   1.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00 -200.00   -2.80 
 105.00  105.00  105.00  105.00  105.00  105.00  105.00  105.00  105.00    4.00    0.00 
   0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   -1.00    0.00    0.00

I do not think that using the leading dimension would be helpful, I think it should be something like this:

Code:

  for (ii = 0; ii < in->co_num_rows; ii++) {
    for (jj = 0; jj < in->co_num_cols; jj++) {
      out[jj + ii*in->co_num_rows] = in->co_values[ii][jj];
    }
  }

But such a transposition code yields an error

Thanks!

firstfire · 02-18-2013, 10:47 PM

Hi.

Can you provide complete example which yields an error and the error message? Your code looks good to me and should work equally well on non-square matrix, provided that arrays in and out are of correct dimensions.

ejspeiro · 02-18-2013, 11:32 PM

Sight u.u' you are right... it would be helpful... let us see... I have created an API for this data type, so I apologize for any incorrect programming practice... although, hopefully, I will improve my code writing by means of your suggestions!

My project code is comprise of 4 files total. I am still working on these files! Feel free to highlight any error, but hopefully, I would have also spotted it

File 1: https://dl.dropbox.com/u/5432016/banded/cmtk.h
File 2: https://dl.dropbox.com/u/5432016/banded/blogs.c
File 3: https://dl.dropbox.com/u/5432016/banded/makefile (Optional if library paths are known?)
File 4: https://dl.dropbox.com/u/5432016/banded/blogs.in

Once everything is in one folder, this should build and execute:

Code:

$ make
$ make runs

I will understand if the amount of code makes this thread a pain in the @$$

In that case, feel free to let me know

Thanks!

firstfire · 02-19-2013, 03:51 PM

Hi.

Here are my observations:

1) To fix segfault in Transpose() you should swap dimensions of the transposed matrix (but see item 4 below):

Code:

  out->co_num_rows = out->fa_rank; //out->fa_bandwidth;
  out->co_num_cols = out->fa_bandwidth; //out->fa_rank;

and fix transpose loop a bit:

Code:

  for (ii = 0; ii < in->co_num_rows; ii++) {
    for (jj = 0; jj < in->co_num_cols; jj++) {
      out->co_values[jj][ii] = in->co_values[ii][jj];
    }
  }

so ii here is the index of the row of in (not of out as before), jj -- column.

2) This line

Code:

  rhs = (CMTK_Number *) calloc(bl->fa_rank, sizeof(CMTK_Number *));

happened to work correctly on a 64 bit system, but it will cause mysterious errors on a 32 bit system, where sizeof(double) != sizeof(double*). The same error repeated when you allocate out->so_memory_holder in Transpose(). You should write sizeof(CMTK_Number) in these places.

3) In my (not very big) experience 2D arrays (arrays of arrays) are not very convenient for representing two dimensional (and more) matrices in C. They are PITA to allocate, deallocate and carry around. I would use linear representation instead. Moreover, LAPACK functions also seem to prefer linear representation.

4) If Transpose() operates on general band matrices, then it may do two a little bit different things:
A) Transpose a band matrix (basically exchange its diagonals)
B) Transpose the core of a band matrix (i.e. transpose a compact representation of a band matrix) to be able pass such matrix to LAPACK routine.

It seems that in its current state your code does not make distinction between these two cases and even mix them a little bit.

5) It is a matter of taste, I know, but why not omit these redundant curly braces?

ejspeiro · 02-19-2013, 09:39 PM

Dear firstfire: Thanks! This has been an incredibly helpful post!

1) Done! Gratefully, I had already spotted that!
2) You are right! In fact, I have made the adequate changes throughout the code!
3) Again, you are right! I have changed the entire implementation to rely on a 1-d array!

Actually I have updated the entire code:

https://dl.dropbox.com/u/5432016/banded/cmtk.h
https://dl.dropbox.com/u/5432016/banded/blogs.c

4) ...
4) ...
4) Sight! u.u' This one is still giving me problems

I am not able to get a solution for cases in where I get a non-squared CORE matrix /* Notice that the FACADE matrices WILL always be squared, since we are solving for a PDE, but the cores can be non-squared. */

Say 1 PDE, step size of 0.1, as in

https://dl.dropbox.com/u/5432016/banded/blogs.in

I have grown aware of the differences between transposing the banded matrix and actually transposing its core, (thanks for your help with that), and I have developed these two methods (under debugging):

Code:

/*!
Monday, February 18, 2013

This methods transposes a given matrix.
*/
CMTK_Boolean Transpose(CMTK_BandedMatrix *in, CMTK_BandedMatrix *out) {

  CMTK_Boolean cool;  /*!< Validation variable. */
  int ii;             /*!< Iterator. */
  int jj;             /*!< Iterator. */
  int acum_01;        /*!< Accumulator. */
  
  /*
  This conditional is written for safety reasons, in case the user forgets to
  initialize the matrix to be constructed before calling THIS constructor. If
  she initializes the pointer, then this call will NOT do it, thus skyping
  over the conditional. If she does NOT initialize the pointer, this constructor
  will do it, thus ensuring the code will not break, BUT it will yield the
  following error from Valgrind:

  ==22540== HEAP SUMMARY:
  ==22540==     in use at exit: 0 bytes in 0 blocks
  ==22540==   total heap usage: 3 allocs, 3 frees, 272 bytes allocated
  ==22540==
  ==22540== All heap blocks were freed -- no leaks are possible
  ==22540==
  ==22540== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
  ==22540==
  ==22540== 1 errors in context 1 of 1:
  ==22540== Conditional jump or move depends on uninitialised value(s)
  ==22540==    at 0x400778: ConstructFromBandedMatrixInfo
  ==22540==    (in /home/ejspeiro/lapack-ex02/blogs)
  ==22540==    by 0x400E94: main (in /home/ejspeiro/lapack-ex02/blogs)
  ==22540==  Uninitialised value was created by a stack allocation
  ==22540==    at 0x400E52: main (in /home/ejspeiro/lapack-ex02/blogs)
  */
  if (out == NULL) {
    out = (CMTK_BandedMatrix *) malloc(sizeof(CMTK_BandedMatrix));
    if (out == NULL) {
      return CMTK_FALSE;
    }
  }
  
  //! Properties of the matrix's CORE:
  out->co_expanded = in->co_expanded;
  out->co_num_rows = in->co_num_cols;
  out->co_num_cols = in->co_num_rows;
  out->co_kl = in->co_ku;
  out->co_ku = in->co_kl;
  
  //! Build the matrix's CORE:
  // TODO: (0x02).
  out->co_values = (CMTK_Number *)
    calloc(out->co_num_rows*out->co_num_cols, sizeof(CMTK_Number));
  if (out->co_values == NULL) {
    free(out);
    return CMTK_FALSE;
  }
    
  out->co_leading_dim = out->co_ku + 2*out->co_kl + 1;

  //! Properties of the matrix's FACADE:
  out->fa_bcs = in->fa_bcs;
  out->fa_num_rows = in->fa_num_cols;
  out->fa_num_cols = in->fa_num_rows;
  out->fa_rank = in->fa_rank;
  out->fa_num_values = in->fa_num_values;
  out->fa_num_non_zero_values = in->fa_num_non_zero_values;
  
  // Compute fa_num_significant_values:
  acum_01 = 0;
  for (ii = 1; ii <= out->co_kl; ii++) {
    acum_01 = acum_01 + (out->fa_rank - ii);
  }
  acum_01 = 2*acum_01 + out->fa_rank;
  out->fa_num_significant_values = acum_01;

  out->fa_bandwidth = in->fa_bandwidth;
  
  // Computing the densities and the sparsities:
  // The ABSOLUTE DENSITY is computed based on the absolute number of non-zero
  // values, which we are no interested in computing in advanced. We will update
  // those as we add elements, therefore, we will also update this property, as
  // we add the values:
  out->fa_abs_density = 0.0;
  out->fa_rel_density =
  ((CMTK_Number) out->fa_num_significant_values)/out->fa_num_values;
  out->fa_abs_sparsity = 1.0 - out->fa_abs_density;
  out->fa_rel_sparsity = 1.0 - out->fa_rel_density;
  
  // Perform the actual transposition:
  for (ii = 0; ii < in->co_num_rows; ii++) {
    for (jj = 0; jj < in->co_num_cols; jj++) {
      cool = Set(out, jj, ii, GetOrDie(in, ii, jj));
      if (!cool) {
        fprintf(stderr, "COULD NOT add value at %d %d! Out of band?\n", ii, jj);
        return CMTK_FALSE;
      }
    }
  }
  
  return CMTK_TRUE;
}

And:

Code:

/*!
Tuesday, February 19, 2013

This methods gets the core array from a matrix.
*/
CMTK_Number* GetCoreValues(CMTK_BandedMatrix *in, char trans) {

  CMTK_Number *out;
  int ii;
  int jj;

  out = (CMTK_Number*)
    calloc(in->co_num_rows*in->co_num_cols, sizeof(CMTK_Number));
  if (out == NULL) {
    return NULL;    
  }

  switch (trans) {
    case 'T': {
      for (ii = 0; ii < in->co_num_rows; ii++) {
        for (jj = 0; jj < in->co_num_cols; jj++) {
          out[ii + jj*in->co_num_rows] = in->co_values[ii*in->co_num_cols + jj];
        }
      }
      break;
    }
    case 'N': {
      for (ii = 0; ii < in->co_num_rows; ii++) {
        for (jj = 0; jj < in->co_num_cols; jj++) {
          out[ii*in->co_num_cols + jj] = in->co_values[ii*in->co_num_cols + jj];
        }
      }
      break;
    }
    default: {
      return NULL;
      break;
    }
  }

  return out;
}

I also don't really get the grasp of the concept of a leading dimensions :$ Since I think I might be getting some sort of error when transposing.

Thanks in advanced!

ejspeiro · 02-19-2013, 09:44 PM

I believe the issue arises when accessing the values in the soon-to-be transposed matrices, since I can only do it within the scope of the band (because the core is entirely pre-allocated), and for some reason, there are cases in where I add in locations outside of the scope of the band.

May there be something funky with the way I define the limits of my cores?

\m/

ejspeiro · 02-20-2013, 05:35 PM

All right! I think I have solved all of my issues with this code by now!

Thank you firstfire! I will call this thread as solved, post the final files:

https://dl.dropbox.com/u/5432016/banded/cmtk.h
https://dl.dropbox.com/u/5432016/banded/blogs.c

And if anybody has any extra suggestions as far as the implementation goes, I will gladly welcome them!

My next step is to distribute the core matrices, in order to solve with ScaLAPACK.

Thanks in advanced!

\m/

firstfire · 02-21-2013, 01:59 AM

Hi.

I'm glad to see you solved the problem! New questions are welcome, we'll try to help.