LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-29-2014, 10:50 PM   #1
915086731
Member
 
Registered: Apr 2010
Posts: 144
Blog Entries: 6

Rep: Reputation: 2
Strange value of the double type variable: -nan(0x8000000000000)


I am confused by the value of "currdisk->currangle" after adding operation. Initially the value of "currdisk->currangle" is 0.77500000000000013, but after adding operation, it's changed to "-nan(0x8000000000000)", Can anyone explain ? Thanks! The following is the occasion of gdb debugging.

Code:
3338          currdisk->currangle += (simtime - seg->time) / currdisk->rotatetime;
(gdb) p currdisk->currangle
$28 = 0.77500000000000013
(gdb) p (simtime - seg->time) / currdisk->rotatetime
$29 = 0.00833333333333325
(gdb) p (simtime - seg->time) 
$30 = 0.092592592592591672
(gdb) p currdisk->rotatetime
$31 = 11.111111111111111
(gdb) n

(gdb) p currdisk->currangle 
$32 = -nan(0x8000000000000)
(gdb) p/x (char[8])currdisk->currangle 
$52 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf8, 0xff}
(gdb)
Then I change
Code:
currdisk->currangle +=  (simtime - seg->time) / currdisk->rotatetime ;
to
Code:
 double tmp1 = (simtime - seg->time) / currdisk->rotatetime; 
currdisk->currangle += tmp1;
. The value of currdisk->currangle is normal. Can anyone explain the confusing phenomenon ?

Last edited by 915086731; 05-29-2014 at 10:53 PM.
 
Old 05-30-2014, 01:29 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,840

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
can you tell us the types of these variables (int, float, double, whatever?)
 
Old 05-30-2014, 02:21 AM   #3
915086731
Member
 
Registered: Apr 2010
Posts: 144

Original Poster
Blog Entries: 6

Rep: Reputation: 2
All is double type
assembly code
Code:
       tmp1 = (simtime - seg->time) / currdisk->rotatetime;
0x0808fdf4  <disk_buffer_sector_done+559>:  fldl   0x80b2208
0x0808fdfa  <disk_buffer_sector_done+565>:  mov    -0x38(%ebp),%eax
0x0808fdfd  <disk_buffer_sector_done+568>:  fldl   (%eax)
0x0808fdff  <disk_buffer_sector_done+570>:  fsubrp %st,%st(1)
0x0808fe01  <disk_buffer_sector_done+572>:  mov    0x8(%ebp),%eax
0x0808fe04  <disk_buffer_sector_done+575>:  fldl   0xc4(%eax)
0x0808fe0a  <disk_buffer_sector_done+581>:  fdivrp %st,%st(1)
0x0808fe0c  <disk_buffer_sector_done+583>:  fstpl  -0x28(%ebp)
      currdisk->currangle += tmp1;
0x0808fe0f  <disk_buffer_sector_done+586>:  mov    0x8(%ebp),%eax
0x0808fe12  <disk_buffer_sector_done+589>:  fldl   0x284(%eax)
0x0808fe18  <disk_buffer_sector_done+595>:  faddl  -0x28(%ebp)
0x0808fe1b  <disk_buffer_sector_done+598>:  mov    0x8(%ebp),%eax
0x0808fe1e  <disk_buffer_sector_done+601>:  fstpl  0x284(%eax)

Last edited by 915086731; 05-30-2014 at 07:58 AM.
 
Old 05-30-2014, 11:20 AM   #4
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Try adding a '1.0 *' or '(double)' cast at the beginning of the calculation.
 
Old 05-30-2014, 12:46 PM   #5
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
I think the problem must lie outside the information you posted.

My best guess is that gdb is showing you incorrect values. The compiler stores information to tell the debugger where local variables are stored at various points in the code. The debugger often misunderstands that info and/or the compiler stored it wrong. So currdisk in the first example you posted may not be where the compiler thinks it is.

You also made this harder by showing the disassembly for the version which seems to work, rather than for the version which seems to fail. Also context is required for understanding the behavior: a moderate amount before the point of failure and a little after.

Quote:
Originally Posted by metaschima View Post
Try adding a '1.0 *' or '(double)' cast at the beginning of the calculation.
Random changes around a confusing issue just create more confusion. In the unlikely event you have some real justification for that suggestion, please explain.

Last edited by johnsfine; 05-30-2014 at 12:50 PM.
 
Old 05-30-2014, 01:38 PM   #6
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
If the variables are actually not doubles but integers then it would all make sense. I don't see why these changes are random.
 
Old 05-30-2014, 03:25 PM   #7
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by metaschima View Post
If the variables are actually not doubles
Meaning you didn't see the post 9 hours before your post or didn't believe it?

Quote:
but integers then it would all make sense.
No it wouldn't. If you imagine different values (not different types) for those variables, you can get to a NaN. But the values are in the post, so you can see that even if they were integers, you won't get a NaN, but you can more clearly see they aren't integers.

Quote:
I don't see why these changes are random.
The OP's original code change should not change the result, but he thinks it did. You propose a different code change that also should not change the result. That is not a reasonable step in any systematic search for the cause.

To understand the situation, we probably need to disbelieve something in the original post. Maybe the OP is not accurately telling us what happened. More likely gdb did not accurately tell the OP what happened. But what to disbelieve must be filtered through some common sense and experience:

Code:
3338          currdisk->currangle += (simtime - seg->time) / currdisk->rotatetime;
(gdb) p currdisk->currangle
$28 = 0.77500000000000013
(gdb) p (simtime - seg->time) / currdisk->rotatetime
$29 = 0.00833333333333325
(gdb) p (simtime - seg->time) 
$30 = 0.092592592592591672
(gdb) p currdisk->rotatetime
$31 = 11.111111111111111
None of that looks like what we should consider disbelieving. gdb output shows those variables are not int's (or at least enough of them are not ints that the suggested cast would make no difference. gdb output shows the values are reasonable.

Code:
(gdb) n
There is something I would suspect (given the starting assumption that something must be distrusted). Did gdb really execute all and only the line of code that the post implies was executed at that point. gdb isn't perfect at that. We don't know what mode things were in. Maybe gdb proceeded to much later or (less likely) only part way.

Code:
(gdb) p currdisk->currangle 
$32 = -nan(0x8000000000000)
(gdb) p/x (char[8])currdisk->currangle 
$52 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf8, 0xff}
(gdb)
There is another thing I don't trust. Does gdb still know the location of currdisk? If gdb is wrong about the location of curdisk, it is wrong about the value of curdisk and showing garbage for currdisk->currangle. I don't trust that we know the true value of currdisk->currangle there.

Last edited by johnsfine; 05-30-2014 at 03:41 PM.
 
1 members found this post helpful.
Old 05-30-2014, 03:42 PM   #8
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 492Reputation: 492Reputation: 492Reputation: 492Reputation: 492
Then why not use printf instead of relying on gdb, that should exclude gdb from being the cause.
 
Old 05-30-2014, 10:59 PM   #9
915086731
Member
 
Registered: Apr 2010
Posts: 144

Original Poster
Blog Entries: 6

Rep: Reputation: 2
Quote:
Originally Posted by metaschima View Post
Then why not use printf instead of relying on gdb, that should exclude gdb from being the cause.
Printf also shows NaN
 
Old 05-30-2014, 11:29 PM   #10
915086731
Member
 
Registered: Apr 2010
Posts: 144

Original Poster
Blog Entries: 6

Rep: Reputation: 2
Code:
tmp1 = (simtime - seg->time) / currdisk->rotatetime;
currdisk->currangle += tmp1;
The above code can also cause NaN in the later calling.

So I step into to the assembly.
Code:
tmp1 = (simtime - seg->time) / currdisk->rotatetime;
0x0808fdf4  <disk_buffer_sector_done+559>:  fldl   0x80b2208  //address of simtime
0x0808fdfa  <disk_buffer_sector_done+565>:  mov    -0x38(%ebp),%eax
0x0808fdfd  <disk_buffer_sector_done+568>:  fldl   (%eax)
...Here, the content of %eax is 0x80bdfeb, which is the address of seg->time.
After the above "fldl (%eax) " executed, the content of register st0 is 0x8000000000000000, which represents NaN. So the NaN is propagated to the following instructions. This is the key issue.
Code:
0x0808fdff  <disk_buffer_sector_done+570>:  fsubrp %st,%st(1)
0x0808fe01  <disk_buffer_sector_done+572>:  mov    0x8(%ebp),%eax
0x0808fe04  <disk_buffer_sector_done+575>:  fldl   0xc4(%eax)
0x0808fe0a  <disk_buffer_sector_done+581>:  fdivrp %st,%st(1)
0x0808fe0c  <disk_buffer_sector_done+583>:  fstpl  -0x28(%ebp)

Last edited by 915086731; 05-30-2014 at 11:41 PM. Reason: More useful info
 
Old 05-31-2014, 02:02 AM   #11
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,840

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
It looks like "everything is ok and correct but the result", so I would like to see that everything. I could not reproduce it, probably you may try to prepare a small but complete code to be able to check it. (from the other hand during the preparation you may find the reason).
 
1 members found this post helpful.
Old 05-31-2014, 05:13 AM   #12
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
I would rerun with a data breakpoint at the address of seg->time so you see each time it changes and can see where it changes to NaN.

If I take into account the claim that the problem was temporarily fixed by a code change that should have had no effect, then it sounds like a memory clobber: an unrelated section of code storing something into the location of seg->time when it was supposed to be storing somewhere else.

But the info in the first post doesn't seem to be consistent with the info in post #10, so I still think there is a big gap either between reality and what is reported by gdb or between what is reported by gdb and what was copied to this thread.

Last edited by johnsfine; 05-31-2014 at 05:18 AM.
 
Old 05-31-2014, 09:05 AM   #13
915086731
Member
 
Registered: Apr 2010
Posts: 144

Original Poster
Blog Entries: 6

Rep: Reputation: 2
Quote:
Originally Posted by johnsfine View Post
But the info in the first post doesn't seem to be consistent with the info in post #10, so I still think there is a big gap either between reality and what is reported by gdb or between what is reported by gdb and what was copied to this thread.
The project reads requests and deals them.
To post #1, I introduce a temporary variable tmp1 which fixes the NaN value. but to post #10, I add more requests to the project, and the NaN occurs again. That means variable tmp1 can't fix issue after the requests provided changed.
 
Old 05-31-2014, 09:16 AM   #14
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,840

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
I think (but I cannot say I'm sure about that) you mixed your variables, you use the same name twice, or you use two different structs for the same thing, or there is a problem with the scope of them, maybe an alignment problem, or out of subscript error in an array. You may try to use valgrind, it can find that kind of issues. Is this a multi-threaded app?
 
1 members found this post helpful.
Old 05-31-2014, 09:22 AM   #15
915086731
Member
 
Registered: Apr 2010
Posts: 144

Original Poster
Blog Entries: 6

Rep: Reputation: 2
it's a single threaded project.
The seg->time is not changed at all. So I can't find any evidence that the memory of seg->time is polluted.
 
  


Reply

Tags
double



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
do you the type of this variable? MounaRM Programming 4 11-25-2013 03:44 PM
[SOLVED] bash variable containing double quotes hashbang#! Programming 10 02-06-2012 07:36 PM
Double-variable indirection in bash davelove Linux - Software 10 11-20-2011 08:13 AM
ostream<< operator with double variable is not accurate, solution ? alextb Programming 14 03-29-2010 02:19 AM
xdosemu everything I type is double ka9qlq Linux - Software 2 10-01-2004 03:33 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:31 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration