LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-16-2012, 09:46 AM   #1
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
nVidia proprietary driver segfaults unless run with gdb [C++]


Hiya,

I'm trying to track down a bug in one of my C++ programmes - it has all the classic symptoms which I associate with memory errors (the key one is that moving my debug output code around changes whether or not it segfaults) but valgrind doesn't complain about my program at all.

When run on its own we get:

Code:
$ make && ./duel
g++ -c duel.cpp -Wall -ggdb
g++ duel.o particles.o draw.o myutils.o bullet.o ship.o playership.o ghostship.o aiship.o -o duel -Wall -ggdb -lglut -lGL -lGLU 
Player ship is #0
AI ship at (-5,5,0) is #1
[1]    4227 segmentation fault  ./duel
With gdb:

Code:
$ gdb duel
<snip - the normal boomf at the start of gdb>
(gdb) r
Starting program: /home/joshua/scripts/c++/opengl/newton_duel/duel 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
Player ship is #0
AI ship at (-5,5,0) is #1
Forwards: (0,0,0)       5       0
Forwards: (0.0005,0,0)  4.9995  0.00025
Forwards: (0.001,0,0)   4.9985  0.001
<snip - normal program execution>
With valgrind:

Code:
$ valgrind --tool=memcheck duel
==4246== Memcheck, a memory error detector
==4246== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==4246== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==4246== Command: duel
==4246== 
==4246== Conditional jump or move depends on uninitialised value(s)
==4246==    at 0x4017876: index (in /usr/lib/ld-2.16.so)
==4246==    by 0x4007902: expand_dynamic_string_token (in /usr/lib/ld-2.16.so)
==4246==    by 0x4008204: _dl_map_object (in /usr/lib/ld-2.16.so)
==4246==    by 0x400180D: map_doit (in /usr/lib/ld-2.16.so)
==4246==    by 0x400E785: _dl_catch_error (in /usr/lib/ld-2.16.so)
==4246==    by 0x40010DB: do_preload (in /usr/lib/ld-2.16.so)
==4246==    by 0x4004546: dl_main (in /usr/lib/ld-2.16.so)
==4246==    by 0x4014B5D: _dl_sysdep_start (in /usr/lib/ld-2.16.so)
==4246==    by 0x4004DFD: _dl_start (in /usr/lib/ld-2.16.so)
==4246==    by 0x4001627: ??? (in /usr/lib/ld-2.16.so)
==4246== 
==4246== Conditional jump or move depends on uninitialised value(s)
==4246==    at 0x401787B: index (in /usr/lib/ld-2.16.so)
==4246==    by 0x4007902: expand_dynamic_string_token (in /usr/lib/ld-2.16.so)
==4246==    by 0x4008204: _dl_map_object (in /usr/lib/ld-2.16.so)
==4246==    by 0x400180D: map_doit (in /usr/lib/ld-2.16.so)
==4246==    by 0x400E785: _dl_catch_error (in /usr/lib/ld-2.16.so)
==4246==    by 0x40010DB: do_preload (in /usr/lib/ld-2.16.so)
==4246==    by 0x4004546: dl_main (in /usr/lib/ld-2.16.so)
==4246==    by 0x4014B5D: _dl_sysdep_start (in /usr/lib/ld-2.16.so)
==4246==    by 0x4004DFD: _dl_start (in /usr/lib/ld-2.16.so)
==4246==    by 0x4001627: ??? (in /usr/lib/ld-2.16.so)
==4246== 
Player ship is #0
AI ship at (-5,5,0) is #1
Forwards: (0,0,0)       5       0
==4246== Invalid write of size 8
==4246==    at 0x7DE7E72: ??? (in /usr/lib/libnvidia-glcore.so.302.17)
==4246==  Address 0x61a000 is not stack'd, malloc'd or (recently) free'd
==4246== 
==4246== 
==4246== Process terminating with default action of signal 11 (SIGSEGV)
==4246==  Access not within mapped region at address 0x61A000
==4246==    at 0x7DE7E72: ??? (in /usr/lib/libnvidia-glcore.so.302.17)
==4246==  If you believe this happened as a result of a stack
==4246==  overflow in your program's main thread (unlikely but
==4246==  possible), you can try to increase the size of the
==4246==  main thread stack using the --main-stacksize= flag.
==4246==  The main thread stack size used in this run was 8388608.
==4246== 
==4246== HEAP SUMMARY:
==4246==     in use at exit: 7,874,762 bytes in 1,873 blocks
==4246==   total heap usage: 2,669 allocs, 796 frees, 13,445,190 bytes allocated
==4246== 
==4246== LEAK SUMMARY:
==4246==    definitely lost: 0 bytes in 0 blocks
==4246==    indirectly lost: 0 bytes in 0 blocks
==4246==      possibly lost: 2,383,893 bytes in 99 blocks
==4246==    still reachable: 5,490,869 bytes in 1,774 blocks
==4246==         suppressed: 0 bytes in 0 blocks
==4246== Rerun with --leak-check=full to see details of leaked memory
==4246== 
==4246== For counts of detected and suppressed errors, rerun with: -v
==4246== Use --track-origins=yes to see where uninitialised values come from
==4246== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
[1]    4246 segmentation fault  valgrind --tool=memcheck duel
Note that it's not my code that's segfaulting, but the nvidia driver

Now, the last time I ran valgrind over my code (a week or so ago) I got nothing from it at all. From my pacman log, it appears I upgraded to glibc 2.16.0-2 and nvidia 302.17-2 yesterday (15th of July), so I'm guessing this is stemming from these updates.

I can't construct a simple, self-contained code snippet because the error only appears if certain output statements are there (just redirecting things to std::cout), and so I can't really track the error down to a particular line or code block - though I know which routine seems to be causing the problem.

Is there any way I can resolve this problem? Is it a bug which needs reporting, and if so, to Arch, nVidia (hah), or the maintainers of glibc? As always, if there's any information I can provide, just let me know.

Thanks!
 
Old 07-16-2012, 01:28 PM   #2
dmdeb
Member
 
Registered: Jul 2007
Location: Germany
Distribution: Debian
Posts: 45

Rep: Reputation: 6
Hey Snark,

would you care to share your source code (plus Makefile or such)? Maybe you're just initializing the libraries (or related objects) in a way they don't expect, and so the error shows its puny head long after the actual cause has taken place?

Cheers,
dmdeb
 
Old 07-16-2012, 04:20 PM   #3
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Original Poster
Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Well, the source code's massive and split over several files. I don't think I'm initialising anything incorrectly because it all worked fine until recently, but it's entirely possible you're correct. I'll create a github repository for it tomorrow and link to that from here (I don't want to spam the thread with all the code...)

Cheers for the response
 
Old 07-17-2012, 05:55 AM   #4
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Original Poster
Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Right, I've created the github repository, it's https://github.com/Snark1994/newton-duel/. The 'master' branch is fine, the problem's in the 'devel' branch. The problem emerged after writing the code for the AI ships (aiship.cpp, the function "void AIShip::update(void);"). If I remove all the 'cout <<' statements in the function, then it runs without a problem, and as stated previously it runs fine under gdb...

EDIT: if it makes any odds, it also runs find under windows with mingw

Thanks,

Last edited by Snark1994; 07-17-2012 at 09:37 AM.
 
Old 07-17-2012, 09:54 AM   #5
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 620

Rep: Reputation: 362Reputation: 362Reputation: 362Reputation: 362
Hi.

Cool! On my onboard Intel 4500MHD there are no segfaults. It would be great if you keep github repository up-to-date.
 
1 members found this post helpful.
Old 07-17-2012, 11:27 AM   #6
dmdeb
Member
 
Registered: Jul 2007
Location: Germany
Distribution: Debian
Posts: 45

Rep: Reputation: 6
Quote:
Originally Posted by Snark1994 View Post
Right, I've created the github repository, it's https://github.com/Snark1994/newton-duel/. The 'master' branch is fine, the problem's in the 'devel' branch. The problem emerged after writing the code for the AI ships (aiship.cpp, the function "void AIShip::update(void);"). If I remove all the 'cout <<' statements in the function, then it runs without a problem, and as stated previously it runs fine under gdb...

EDIT: if it makes any odds, it also runs find under windows with mingw

Thanks,
Hi Snark,

thanks for sharing. Unfortunately (in a way), both the master and the devel branch work perfectly fine on my system (nvidia x86_64-295.40). Valgrind did not report any errors either. So much for the quick shot... I'll try to have a deeper look now.

Best regards
dmdeb
 
1 members found this post helpful.
Old 07-17-2012, 01:37 PM   #7
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Original Poster
Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
@firstfire: I will try my best to do so! And if you're interested in the project (especially the graphics side - I've never done any of that before) you're more than welcome to contribute

@dmdeb: Thanks for trying it, what's your glibc version? If I can find or create a system with glibc 2.16.0-2 with an earlier version of nvidia, then install the newest nvidia version, then we've narrowed it down to a nvidia bug...
 
Old 07-17-2012, 03:08 PM   #8
dmdeb
Member
 
Registered: Jul 2007
Location: Germany
Distribution: Debian
Posts: 45

Rep: Reputation: 6
Quote:
Originally Posted by Snark1994 View Post
@firstfire: I will try my best to do so! And if you're interested in the project (especially the graphics side - I've never done any of that before) you're more than welcome to contribute

@dmdeb: Thanks for trying it, what's your glibc version? If I can find or create a system with glibc 2.16.0-2 with an earlier version of nvidia, then install the newest nvidia version, then we've narrowed it down to a nvidia bug...
Hey there,

my libc says "GNU C Library (Debian EGLIBC 2.11.3-3) stable release version 2.11.3". Way old, as usual for a Debian stable system...

I sent an email with details to your gmail address in order not to stress this thread too much.

Cheers
dmdeb
 
Old 07-17-2012, 04:48 PM   #9
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Original Poster
Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Right, got it fixed with the help of dmdeb (wish I could give you more reputation for that, but LQ tells me to 'spread it around' and I feel marking your previous post as helpful would be slightly misleading...).

The suggestion which worked was the following:

Quote:
The updateUniverse() function loops over the objectList and,
where requested, erases items - or even adds new ones (Emitters).
I'd be ultra-careful about modifying containers while iterating
over them; the world is full of weird bugs caused by such attempts
(and I am a fan of defensive programming, too). You could postpone
the erasing to after the loop using remove_if() or so, and maintain
a temporary list of new Emitters to be added to the objectList in
another step.
My problem was that I was erasing/adding items to a vector over which I was iterating, which was the cause of the segfault.

Thank you very much, dmdeb!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Proprietary NVIDIA Driver Troubleshooting winter_ken Debian 3 05-03-2012 04:48 AM
nvidia proprietary driver, nvidia-settings and how to configure panning bluebox Linux - Hardware 1 03-11-2011 03:21 PM
[SOLVED] xorg segfaults with nvidia driver after upgrade to 13.1 htitan Slackware 4 06-06-2010 02:28 PM
Has anyone tried the proprietary driver with an Nvidia gt310m? damgar Linux - Hardware 1 04-18-2010 03:05 AM
does the vesa console framebuffer driver conflict with the proprietary nvidia driver? mr.v. Linux - Hardware 2 01-28-2007 06:51 AM


All times are GMT -5. The time now is 07:56 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration