glibc error

vkmgeek · 07-10-2007, 03:46 AM

Hi
I am running a large application where I used to run more than 2000 threads at user level.
When I am running this in RHEL4u3, I am getting segmentation fault in free() . It gives that memory is already freed.
Where as, if I run the same program on RHEL5 or FC5, I am not getting any segmentation fault. Even if I run it 4-5 times consecutively.
I have RHEL4u3 -- gcc version 4.1.0 20060304 (Red Hat 4.1.0-3) -- kernel 2.6.9.-34
and RHEL5 -- gcc version 4.1.1 20070105 (Red Hat 4.1.1-52) -- 2.6.18-8
FC5 -- gcc version 4.1.0 20060304 (Red Hat 4.1.0-3) -- 2.6.16-1.2080

Any help/pointers would be more than welcome

One more thing I am using 2 another .so files which are also created by me. Now, I want to make sure that those ".so" are not the culprit. So, I want to debug it through valgrind. But I dont know how to use valgrind with so file...
Could someone please guide?

nc3b · 07-10-2007, 06:35 AM

Can't help with valgrind. But here are some "pointers" (funny you should talk about them, they probably are the culprit).

Compile your program for debugging (-g) and run it with gdb. Then with bt, you might get to the line where it crashes. Usually it's a good thing to test that a pointer is not null before attempting to free it. Also, I usually set a pointer to null after freeing it (don't know if it's a good thing, but it works for me).

Second, if you don't get to the root of the problem with gdb, or the line that gdb points to doesn't look descriptive, post the output here, and some parts of the code. Cheers.

wjevans_7d1@yahoo.co · 07-10-2007, 07:24 AM

Try electric fence also. It substitutes its own malloc() and free() for the standard ones, so your .so files will use it automatically.

I say "try", because, um, you're using POSIX threads? And you're running some 2000 of them concurrently?

One reason that I use threads as little as possible is that one thread can easily trash memory for the "benefit" of the thread that crashes. Makes debugging very difficult.

If threads don't communicate (or start and stop) very often, and by "often" I mean at least a few times a second, I usually try to stick with fork(). I say this with some hesitation, because I know that it would be quite time-consuming to re-factor your code.

If you do convert to separate processes, try to avoid System V style shared memory and semaphores. Instead, lock (portions of) a file for semaphores. For shared memory, your parent process should use mmap() on /dev/zero for as many bytes as it needs.

Avoiding System V style shared memory and semaphores ensures that when your whole program exits, you won't have those pesky shared memory blocks and semaphores leaking all over the place.

Good luck.

dmail · 07-10-2007, 08:10 AM

Quote:

Usually it's a good thing to test that a pointer is not null before attempting to free it. Also, I usually set a pointer to null after freeing it (don't know if it's a good thing, but it works for me).

Deleting a null pointer is a no op, setting a pointer to null is only valid when there is one copy of the pointer and it is not passed as a pointer to anything which deletes it ie.

Code:

void some_func(Foo* f)
{
...
free(f);
f =0;
}

foo* bar = malloc(sizeof(foo));

some_func(bar);
if(bar != 0)//this is true

Code:

void some_func(Foo** f)
{
...
free(*f);
*f =0;
}

foo* bar = malloc(sizeof(foo));

some_func(&bar);
if(bar != 0)//this is false

Quote:

Where as, if I run the same program on RHEL5 or FC5, I am not getting any segmentation fault.

Arr the joys of making a multithreaded application do what you think it's doing

vkmgeek · 07-10-2007, 09:38 AM

Quote:

Originally Posted by nc3b

Second, if you don't get to the root of the problem with gdb, or the line that gdb points to doesn't look descriptive, post the output here, and some parts of the code. Cheers.

Thanks for your input. I also practice the same thing. But as posted in other post and as i doubt that it is happening the same thing here... Assigning NULL to the pointer at one place will not save me at other place as if I am passing the pointer in a function.

However, through gdb only, i found that it is giving seg fault at free() function.

nc3b · 07-10-2007, 12:44 PM

You are right guys. This might help in future cases of segfault..

vkmgeek · 07-11-2007, 03:22 AM

Quote:

Originally Posted by wjevans_7d1@yahoo.co

Try electric fence also. It substitutes its own malloc() and free() for the standard ones, so your .so files will use it automatically.

Good luck.

hey dude,
Can u give me URL where i can get efence.... i tried searching on Net... But i cudnt get it specifically for rhel4u3 64bit...
So, i ended up using DUMA....

wjevans_7d1@yahoo.co · 07-11-2007, 08:02 AM

I presume you tried to build this?

http://perens.com/FreeSoftware/Elect....13-0.1.tar.gz

What happened?

ta0kira · 07-11-2007, 08:15 AM

Double free happens when the same object is deleted twice. ALWAYS set pointers to NULL right after deleting them, even if it's in a destructor. You also need mutexes when doing something significant like deleting.

I usually place my inter-thread dynamic objects in lists which are centrally-accessible. They are encapsulated so that a mutex must be set before adding, removing, or modifying elements.

When a function or object needs to access an object in the list, I pass a 'const void*' corresponding to that object which does not get cast. Instead, the function gains access to the master list in-turn (setting the mutex) and searches the list for a matching pointer. If the object is there, the function uses it safely, and since the mutex is set it cannot be deleted while it's being used. If the object isn't there, there isn't an undefined behavior problem; the function just logs an error and exits.

The encapsulation I use (written by me, admittedly) provides unlimited read-only access, so the mutex doesn't need to be clear just to read an object. If a read-only operation is lengthy I will access in write mode to keep another thread from changing the list.

You don't necessarily need to use the encapsulation I use (it does have its price,) but I highly recommend tabulating, encapsulating, centralizing, and restricting modification access to inter-thread dynamic objects.
ta0kira

PS This applies to C++, but can be isolated to allow use with C. The major project I'm working on now uses C++ internally to help with this but the API is all in C.

wjevans_7d1@yahoo.co · 07-12-2007, 06:58 AM

Quoth ta0kira:

Quote:

The encapsulation I use (written by me, admittedly) provides unlimited read-only access, so the mutex doesn't need to be clear just to read an object. If a read-only operation is lengthy I will access in write mode to keep another thread from changing the list.

If you're using threads, then I hope that by "lengthy" you mean "two bytes or longer". Nothing more frustrating to find than a race condition bug that happens extremely rarely.

ta0kira · 07-12-2007, 11:03 AM

"Lengthy" in this case means "must perform multiple operations on the retrieved element," such as read/process/read/etc., as opposed to reading a single int. I'll probably add individual element encapsulation in addition to list encapsulation so the entire list doesn't have to be locked out to delete an element. The list (also written by me) uses stable pointers (as a selectable option,) so even if an element is repositioned it's pointer, and hence using threads, still remain valid. The encapsulation I use does have the ability to indicate current read-only access status, and also the ability to "kick off" modules currently accessing them, so it wouldn't take a whole lot of work to add that protection, as well. That was part of the initial program design, but I hadn't gotten that far pending structural revisions and a working prototype. Element encapsulation is still a consideration subject to revision, but the fundamental concept of tabulating and encapsulating remains a valid strategy that's solved all of my thread-related problems up to this point.
ta0kira

wjevans_7d1@yahoo.co · 07-13-2007, 05:48 AM

Wait. I don't quite understand something. Please be patient with me. (grin)

Element X (to give it an arbitrary name) is 8 bytes long and contains "AAAAAAAA".

Thread fred wants to read it.

Thread barney wants to replace the value of that 8-byte element with "BBBBBBBB".

For simplicity, let's say we don't care whether fred retrieves the value "AAAAAAA" or the value "BBBBBBBB".

There is nothing keeping fred from accessing that element any time he wants, right?

barney gets in there and starts to modify the element, and gets suspended in mid-modification.

fred starts running and retrieves "BBBBAAAA".

It's rare, but it can happen, no?

ta0kira · 07-13-2007, 09:43 AM

Yes, that is possible, but the program doesn't deal with indefinite sizes or byte arrays in the context under speculation. Nothing retrieved is larger than a register, and I highly doubt any recent kernel will ever write a byte while the word itself is being read, especially since the word will always be dealt with as a word in the program.

You do bring up good points, but as I stated before it's already a consideration, and the only place I read where write access isn't locked out reads a register-sized int which is in no danger of being popped or deleted mid-read.
ta0kira