LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-21-2013, 05:31 PM   #1
rlsmithga
LQ Newbie
 
Registered: May 2013
Location: savannah, GA
Posts: 4

Rep: Reputation: Disabled
regression testing FORTRAN code SuSE vs RH 6.4 getting very different answers


I have some legacy code I need to port to RH6.4 from SuSE 10 Patch 3.
I have three machines: Suse AMD (SAMD), RH6.4 Intel (RHI) and RH6.4 AMD (RHA). The two AMD machines use the same chipset.

On SAMD I get an answer of -0.108E-00
On RHI and RHA I get an answer of -0.106E+02 ~ two orders of magnitude off!!

The SAMD answer is the correct answer. I presume libm is the difference between SuSE and RH6.4. The code was compiled on RHI and executed on all three machines. I get the same results with gfortran, Intel 13.1.146 and PGI 13.4; the results on SuSE differ from those generated by RH. I compiled with -O0 and still the difference remain. Is there a bug in libm on RH? LDD does not provide much useful information in this case. I'm looking for suggestions as to how to determine why I'm getting such a huge difference and how I can resolve them.
 
Old 05-21-2013, 06:16 PM   #2
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
Some additional information might be helpful, to help us to help you.

If the Fortran code is not proprietary and not extensive, please post it.

If you can't post it, please categorize what it does.

These are all x86 machines, yes?

The Fortran code merely making arithmetic calculations, yes?

If yes, are there are lot of calculations involved, or just a few?
 
1 members found this post helpful.
Old 05-21-2013, 06:47 PM   #3
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,624

Rep: Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651
Off hand i would guess that it is the differance from the rather old gcc on seld(s) 10 and the current gcc 4.6 on rhel6.4

by "legacy code" is this f77 or f95 ?
 
Old 05-23-2013, 05:56 PM   #4
rlsmithga
LQ Newbie
 
Registered: May 2013
Location: savannah, GA
Posts: 4

Original Poster
Rep: Reputation: Disabled
Thank you for the reply. I am unable to post the code due to its proprietary nature. I can tell you the source is written FORTRAN 90. I get the same results with gcc, Intel 13.1.146 and PGI 13.4. The only difference I'm able to detect is the OS. RH on AMD and RH on Intel yield the same answers, but they do not match the SuSE answers on AMD. I have no SuSE on Intel, however. I believe I've isolated the difference to a call to matmul and because I only have an hour or so per day to work on the problem, I cannot say for certain that matmul is indeed the problem. Stay tuned and I'll post my progress.
 
Old 05-23-2013, 06:10 PM   #5
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
How big is the code? For issues like this I often dive in and print out a few key variables in specific locations in the code, then use those to trace down where the two machines begin to differ in their calculations.

You may find that the problem is due to a bug in the code, perhaps using a 32-bit float where more precision is required, and you're getting a roundoff or floating point approximation error that's then propagating into the final result.

I've also run into issues where poor programming practices cause different compilers or machines to treat the code differently, yielding drastically different results.

Unfortunately, being in a science-dominated field, I run into this MUCH more often than I should when dealing with code from 3rd parties. They're often written by physicists with little or no programming background who are just hacking their way through a language they barely understand until they get an answer that looks right-ish. The end result is a code that barely works on their machine, and then explodes as soon as you change the compiler or architecture.

Last edited by suicidaleggroll; 05-23-2013 at 06:13 PM.
 
1 members found this post helpful.
Old 06-08-2013, 05:05 PM   #6
rlsmithga
LQ Newbie
 
Registered: May 2013
Location: savannah, GA
Posts: 4

Original Poster
Rep: Reputation: Disabled
suicidaleggroll, thank you for the tip about the poor programming skills of certain professionals. After nearly 30 years of working at Sandia Naional Labs and Los Alamos, poor programming should have been my first thought. Also, I thought I had array bounds checking enabled in my make file, but it was not. Long story short, after writing out the contents of several variables, I confirmed the difference in values was the result of a call to matmult. I replaced that call with BLAS' gemm. After this change the code generated a segfault. I reviewed the make file and discovered I had not enabled 'check all' (Intel v 13 compiler) and found a line where the code was trying to read the 0th element of an array. The index was generated via some convoluted sequence of math, so the code has been returned to the developer for correction.

Again, thank you for the tip.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
regression testing FORTRAN code SuSE vs RH 6.4 getting very different answers rlsmithga Linux - Software 1 05-19-2013 11:20 PM
compiling fortran code in SuSe 11.3 DanSandbergUCONN Linux - Newbie 8 05-24-2011 10:42 AM
LXer: Observations On Long-Term Performance/Regression Testing LXer Syndicated Linux News 1 10-02-2010 05:36 PM
LXer: Linus Torvalds on regression, laziness and having his code rejected LXer Syndicated Linux News 0 01-21-2009 09:11 PM
Automated Regression Testing Software qcoder Programming 3 04-21-2006 12:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration