LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-06-2004, 04:21 PM   #1
manojg
Member
 
Registered: May 2004
Posts: 78

Rep: Reputation: 15
why program terminates?


Hi,

I was running a fortran code in Redhat Linux system. It was supposed to take about 6.5 hours. But it was terminated earlier with incomplete number of data.

I thought it was due to cpu time limit. I checked it by command < ulimit -t >. The time limit is "unlimited" because it is my own computer.

Then I run another fortran code. It was supposed to take about 7.0 hours. It run for complete time and produced complete data.

So, I am puzzled. I couled fix this problem. Could you please help me to fix this. I am using Pentium IV.

Thank you very much.

Manoj Gupta
 
Old 06-06-2004, 04:47 PM   #2
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,337

Rep: Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548
"Could you please help me to fix this."

I don't know why your program terminated early.

You should consider putting checkpoints in your program, say a checkpoint every half hour where you write your intermediate results to a disk file. And then add the ability for the program to resume running from any checkpoint. That way if the program fails during a long run you can resume from the last checkpoint and only have lost about 15 minutes work.

___________________________________
Be prepared. Create a LifeBoat CD.
http://users.rcn.com/srstites/LifeBo...home.page.html

Steve Stites
 
Old 06-07-2004, 12:52 PM   #3
manojg
Member
 
Registered: May 2004
Posts: 78

Original Poster
Rep: Reputation: 15
Hi Steve ,

Thank you for your suggestion. This can help to save the time.

Actually, I am curious to know why the program terminates. I am writing a little bit about the program.

In the program there are two do loops. like:

do 10 i = 1.1, 2.0, 0.1
do 20 j = 1, 200000

So, it should produce 200000 points for each 1.1, 1.2, ...... 2.0(for each i). But it produced for i = 1.1, .....1.9 only. So there were total 1800000 points instead of 2000000.

I put the same program in three different computers. In all computers, it produced same number of points(1800000).

In the same program, I just change the range of the do loop like:

do 10 i = 0.0, 1.0, 0.1
do 20 j = 1, 200000

In this case, it produced all points ( 2200000) although this took more time.
So, I am puzzled why the same program behaves differently.

I appreciate your help.

Manoj
 
Old 06-07-2004, 02:14 PM   #4
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,337

Rep: Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548
"I am puzzled why the same program behaves differently."

I do not think that you have enough information to solve the problem. You need to run some tests to find the error. Obviously you do not want to run for hours just for a test. So I suggest that you try this test:

do 10 i = 1.9, 2.0, 0.1
do 20 j = 1, 200000

Also you need to get some significant error messages. You can do this by placing some diagnostic messages in the program and recompiling it. Give some thought about where to place the diagnostic messages so that you are not flooded with millions of messages.

---------------------
Steve Stites
 
Old 06-07-2004, 02:37 PM   #5
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Quote:
Originally posted by manojg
<snip>
In the program there are two do loops. like:

do 10 i = 1.1, 2.0, 0.1
do 20 j = 1, 200000

So, it should produce 200000 points ...
</snip>

<snip>
In the same program, I just change the range of the do loop like:

do 10 i = 0.0, 1.0, 0.1
do 20 j = 1, 200000

In this case, it produced all points ...
</snip>
I'm no expert in FORTRAN (I used it many, many moons ago), but my instinct says that these two loops are not equivalent.

I'm assuming that do loops are of the form: variable = start, stop, increment

If that is the case, your start and stop conditions are not identical. In the first example (the one that produces 180000 points), your start and stop are 1.1 and 2.0 respectively (meaning a difference of 2.0 - 1.1 = 0.9; an increment of 0.1 means the nested do-loop will be executed 0.9 / 0.1 = 9 times).

For your second example, start and stop are 0.0 and 1.0 respectively (meaning a difference of 1.0 - 0.0 = 1.0; an increment of 0.1 means the nested do-loop would be executed 1.0 / 0.1 = 10 times)

So, having analyzed that, I assume your nested do-loop generates the "data points" (one data point for each time through the loop). So the first example would generate 20000 * 9 = 180000 data points. The second example would generate 20000 * 10 = 200000 data points.

Again, my understanding might be slightly skewed from not using FORTRAN in a while, but my gut tells me your problem lies in having different do-loop ranges.

Last edited by Dark_Helmet; 06-07-2004 at 02:44 PM.
 
Old 06-07-2004, 03:36 PM   #6
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,337

Rep: Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548Reputation: 548
Dark_Helmet

A Fortran do loop executes both the start and stop points so the number of loop iterations is ((stop - start)/increment) + 1 when (stop - start) is evenly divisible by increment and ((stop - start)/increment) when (stop - start) is not evenly divisible by increment.

------------------------------------------
Steve Stites

Last edited by jailbait; 06-07-2004 at 03:40 PM.
 
Old 06-07-2004, 04:08 PM   #7
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
My apologies. I should think twice before getting into something I'm not familiar with

Admittedly, I skimmed the post and it seemed to me the perceived problem was the number of data points compared to the do-loop ranges. When I saw the ranges were different, I thought that was the obvious core of the problem (just a simple coding mistake that I fall prey to sometimes).

That being the case, my only suggestion would be to look through the block of code and double-check your references to the "i" variable. Assuming that both examples used the exact same code and the first fails while the second works... that tells me the do-loop range for "i" (the only thing that changed) is causing your problem (or perhaps another variable that takes its value from i).

Other than that, I'll quietly excuse myself from the discussion
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
kppp terminates unexpectedly mmarkvillanueva Slackware 2 10-23-2005 09:15 AM
valgrind terminates unexpectedly appas Linux - General 0 02-18-2005 02:30 AM
Squid terminates shortly after starting Dawyea Linux - Software 4 06-12-2004 02:55 PM
Fetchmail terminates with signal 15 teeka Linux - Networking 0 05-19-2004 11:12 AM
Netscape 6 Terminates!! islandkid Linux - General 5 05-04-2002 02:14 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 05:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration