LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 06-05-2024, 05:36 PM   #76
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,263

Rep: Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339Reputation: 5339

Let me just reply to the top post without reading any of the replies. My apologies in advance if my point has already been made.

The member initialization list is supposed to be in the same order that the members are declared in the class.

So obj2 would need to look like this:

Code:
class obj2 {
public:
    obj2(): ptr1(nullptr), ptr2(nullptr), bval(false) {}
private:
    int* ptr1;
    int* ptr2;
    bool bval;
};
Note that ptr2, ptr2, and bval are in the same order in both the class declaration and the constructor MIL.

Although with modern C++, it should really just look like this:

Code:
class obj2 {
private:
    int* ptr1{};
    int* ptr2{};
    bool bval{};
};

Last edited by dugan; 06-06-2024 at 08:16 AM.
 
Old 06-05-2024, 11:40 PM   #77
EdGr
Senior Member
 
Registered: Dec 2010
Location: California, USA
Distribution: I run my own OS
Posts: 1,005

Rep: Reputation: 476Reputation: 476Reputation: 476Reputation: 476Reputation: 476
selfprogrammed - The symptoms you are describing are due to the race condition between the threads.

The lack of a mutex on refcount can lead to an erroneous deallocation, which then makes a use-after-free appear like a corrupt struct. The memory being overwritten is on the heap. After that, anything can happen.

Race conditions can range from a localized bug to the program was designed wrong. The code does not make me feel like the developers knew what they were doing. That is all the value I can add without a lot of work.
Ed
 
1 members found this post helpful.
Old 06-06-2024, 01:20 AM   #78
henca
Senior Member
 
Registered: Aug 2007
Location: Linköping, Sweden
Distribution: Slackware
Posts: 1,024

Rep: Reputation: 689Reputation: 689Reputation: 689Reputation: 689Reputation: 689Reputation: 689
Quote:
Originally Posted by selfprogrammed View Post
I run voxelands using gdb to catch all faults, so I can examine them.
I have several times observed that the structure was entirely trashed (this).
It is now more than a month since your initial post about this problem. Did you try the gcc thread sanitizer to rule out race conditions? Have you tried to install rr to be able to step forwards and backwards in a recording of such a program crash?

https://www.youtube.com/watch?v=ZJKBwQ71LN4

regards Henrik
 
Old 06-06-2024, 01:39 AM   #79
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,109

Rep: Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367Reputation: 7367
Quote:
Originally Posted by selfprogrammed View Post
I consider this to be a Heisenbug, because it moves or disappears when you try to look at it. It is probably the third Heisenbug that I have encountered.
No, it is not. CodeChecker identified 83 high severity bugs which all may lead to segfaults and memory overwrite or some other kind of fatal error. It's not secret and it's definitely there, you don't have to look for it. Just you need to fix them, which is most probably not trivial.
https://clang.llvm.org/docs/analyzer/checkers.html
this is a link where all those detected issues explained, what do they really mean.

These errors are much easier to fix than errors related to threads (but obviously you need to identify those issues too, later).
 
Old 06-06-2024, 08:25 AM   #80
BrunoLafleur
Member
 
Registered: Apr 2020
Location: France
Distribution: Slackware
Posts: 428

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
There is little places in voxelands where it launch threads. One place is server.cpp in which there were just 2 places where a lock was missing. m_env_mutex and m_con_mutex locked large areas and protect essentially the quasi globals m_env and m_con objects. All the items / lists and meshes objects are protected by those m_env_mutex and m_con_mutex.

Another place is client.cpp where there is also a thread (MeshUpdateThread) which touches m_env and which is not protected by its mutex which doesn't exist. I will add it and put some portions of game.cpp which uses the Client classes as methods of the Client class to protect them with m_env_mutex.
m_con is not touched by the thread of the Client class.

Meshes objects are protected by m_env_mutex (to be added for client.cpp) so the local mutexes for Meshes are not usefull (because list of meshes should be protected by mutexes - m_env_mutex is doing the job).

The m_env and m_con objects can't be easily protected at a lower level than the globals m_env_mutex and m_con_mutex. They should be reworked with const methods and members as much as possible and / or split in more manageable objects for maybe having more local mutexes.

Those lacking mutexes are enough to explain aleatory crashes. I don't think other static errors are really meaningful for voxelands or has been corrected with some cleaning by the op or me.

Last edited by BrunoLafleur; 06-06-2024 at 10:19 AM.
 
Old 06-06-2024, 08:43 AM   #81
BrunoLafleur
Member
 
Registered: Apr 2020
Location: France
Distribution: Slackware
Posts: 428

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
Quote:
One problem with a threading fault is that the corruption would not be severe enough.
That might mix writes from two threads, but that data would at least be otherwise valid ptrs.
The corruption can be very severe and maybe not correlated at all with valid ptr but else we have contents which get directly corrupted because they are written at the same time they are read, so they end in an indeterminate state even for simple variables. Moreover some objects are lists which are updated at the same time we read them. So they end in a really bad state in some playing circumstances.

And for simple ptrs, they are not atomic and can be corrupted.

All those should be corrected after adding the missing m_env_mutex in client.cpp. server.cpp should now be thread safe.
 
Old 06-09-2024, 03:13 PM   #82
BrunoLafleur
Member
 
Registered: Apr 2020
Location: France
Distribution: Slackware
Posts: 428

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
I have done a new patch which replace the previous at the same link as before.

I have added some missing mutex in the client side. m_env_mutex is not good for the client. But some more inside local mutexes were missing and were pointed by valgrind. They locks lists (append, delete of elements who could be in a really bad state without locking).

I have tested with valgrind --tool=drd --error-limit=no and -DENABLE_AUDIO=off in cmake (sound is way too long to test in valgrind).

And for now it doesn't segfault in some hours playing, but it is not a proof it will not segfault again.
 
Old 06-09-2024, 10:52 PM   #83
chris.willing
Member
 
Registered: Jun 2014
Location: Sydney, Australia
Distribution: Slackware,LFS
Posts: 922

Rep: Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623Reputation: 623
Is there any possibility the thread starter could change the thread title please? Discussion of updating GCC (which is why I clicked on it) has long passed so is a bit misleading now.
 
Old Yesterday, 04:37 AM   #84
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37, 14.2, 15.0
Posts: 641

Original Poster
Rep: Reputation: 156Reputation: 156
I also have gotten longer sessions without segfault. It still has bad_alloc.

I have been adding the cleaning-patches ver1 by hand, so I can see what they are and what might be affected. I have gotten 70% done with that.
Some of the patches really need some comments, as I cannot determine if there is something serious being fixed, and some of the patches modify the program behavior (drop package).
I still have not found a really clear bug fix in the patches. Perhaps in ver2 of the patches.
Such patches might be easier to deal with if they were issued separately instead of as one big patch.
To apply ver2, I am going to have to got through it by hand to find the new changes.
It would be different if I was just trying to patch the program so I can run it, but for the immediate effort I am trying to find the mystery bug, so I have to read it line by line.

The original post was about mystery bugs that defied understanding.
I still do not understand the bugs and how they could behave like that. I suspect there is something wrong in this program besides the thread locking.
My meaning for "understand" may be considerably different than that of other commenters.
I want more than "it could", and want to see the sequence of events.

Synopsis:
1. BrunoLafleur has run voxelands, and has observed the faults. He has also made patches that have made it much better.
He has also managed to run valgrind upon it.
2. I have compiled using GCC 12.3, which changed the exact symptoms, but not the nature of the symptoms. It is still buggy, in the same way.
3. Compiling with Clang did not work, due to something in the CMake and Slackbuild.
4. I have found things wrong with the voxelands code and have been patching.
5. I have not found the bug that is wild-writing. It may be involved with a thread locking issue, but even though others may believe that thread locking explains everything, I do not believe it is a sufficient answer. It may be that a thread locking error is provoking something else in the program to also fail, so it is entirely possible that fixing a thread locking may fix the wild-writes.
Such is the nature of latent faults, and benign code errors.
Also, I have been bitten too many times, and will remain cautious regarding the claimed scope and effectiveness of patches (especially when they are not commented as what they fixed).
6. The compiler may be at fault, in that the STL implementation is provoking faults. It may be that operations like memcpy were valid when voxelands was originally written, but that the STL implementation has invalidated that operation, and other fragility may have been introduced.
7. Overall, I find STL dangerous to use due to it hiding restricted operations and making it too easy for the user to get caught by a gotcha.
If possible I will want to avoid it.
8. That most of the faults in voxelands have repeatedly involved the STL vector, is suspicious in itself.

For the purpose of the original post, this thread has probably served its purpose.
The patches discovered should be saved somewhere. Perhaps we would want to start a new thread - - voxelands bugs defy debugging.
I doubt that we can find a voxelands site that would accept the work.
I will not close the thread for now.

For the sake of other readers, I looked at changing the title, and could not find a way to do it.
I suspect it is there, but not where I can find it.

I now have two deadlines to worry about, and cannot tend to this issue adequately at the moment.
It may be a month.

Thank you for the work and comments.

---
I have looked for rr, and have not found it. Not in slackbuilds either.
Looked for gcc thread sanitizer, and cannot tell if it is in the standard Slackbuild. I have not found it.

Last edited by selfprogrammed; Yesterday at 04:52 AM.
 
Old Yesterday, 07:57 AM   #85
kgha
Senior Member
 
Registered: May 2018
Location: Sweden
Distribution: Slackware 64 -current multilib from AlienBob's LiveSlak MATE
Posts: 1,097

Rep: Reputation: 766Reputation: 766Reputation: 766Reputation: 766Reputation: 766Reputation: 766Reputation: 766
Quote:
Originally Posted by selfprogrammed View Post
For the sake of other readers, I looked at changing the title, and could not find a way to do it.
I suspect it is there, but not where I can find it.
Go to first post, click "Edit" and then "Go advanced".
 
Old Yesterday, 09:52 AM   #86
BrunoLafleur
Member
 
Registered: Apr 2020
Location: France
Distribution: Slackware
Posts: 428

Rep: Reputation: 388Reputation: 388Reputation: 388Reputation: 388
Just a note : by default a program is not thread safe (it is not the duty of the compiler). It has to be made so. In particular all shared variables should be protected by mutexes or equivalent mean to achieve barrier synchronization. Else those shared variables (which can be complex objects or simple variables) will be in a very bad or incoherent state sooner or later (generally sooner than later). This is unavoidable. It doesn't mask other errors but are errors in themselves and a multithreaded program should be entirely written with the problem of shared variables in mind.

In Voxelands there are multiple threads running which are lauched by the main program first in server.cpp (which is a thread of main which launches sub-threads) and second in client.cpp.

server and client don't share variables except for log streams which are not protected (I will do it - valgrind tell about them).

But server share with its sub-threads some big objects (which contains mesh, lists and other sub objects). Those can (and are - they were missing some) be protected with mutexes at high levels.

client has only one thread but is also sharing with it its own copy of the same big objects. Those can be protected only at the lower level possible to avoid bad recursive or dead locking. They were some missing low levels mutexes in some critical parts.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
step 5.10 gcc/limitx.h gcc/glimits.h gcc/limity.h no such file or directory iambrj Linux From Scratch 7 08-07-2018 11:22 AM
I have two version of GCC on the same linux system. How do i make the latest GCC version as default C++ compiler? seke Linux - General 9 08-06-2018 09:46 PM
LXer: GCC 4.9 vs. GCC 5.1 vs. GCC 6.0 SVN Compiler Benchmarks LXer Syndicated Linux News 0 06-08-2015 01:00 PM
[SOLVED] SEGMENTATION FAULT using gcc 4.4.4 -O2 , works with gcc 4.1.0 -O2 or gcc 4.4.4 -O1 amir1981 Programming 36 07-26-2010 06:07 PM
Regarding distribution + kernel version + gcc version + glib version. JCipriani Linux - General 8 04-19-2008 02:54 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 04:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration