Are increments and decrements (++ and --) atomic on SMP and multi-core?

jspenguin · 09-13-2006, 02:05 PM

I'm writing a simple reference count and garbage collector in C++. Every object has a field called refcount, which gets incremented every time the object is referenced.

My question is: what happens if two threads reference the object at the same time? All it does is call refcount++. Do I need to protect it with a mutex? Does declaring the refcount volatile help?

The deref function is already protected by a mutex, but I want to avoid an expensive lock if I don't need it.

Edit: I tested it, and it is atomic on single-core x86.

paulsm4 · 09-13-2006, 02:34 PM

Nope: the "++" and "--" are *not* guaranteed to be atomic; you *must* synchronize access whenever you increment or decrement any global object in a multithreaded environment. This is especially true in an SMP environment (either two CPUs, or a single dual-core CPU), but it is also true (in general) for uniprocessors.

tuxdev · 09-13-2006, 02:42 PM

Eh, aren't the ++ and -- operators themselves probably atomic(since the refcount is probably an int), but the function call isn't?

jspenguin · 09-13-2006, 03:42 PM

Code:

#include <stdlib.h>
#include <stdio.h>

#include <SDL.h>
#include <SDL_thread.h>

volatile int testval = 0;

int inc_thread(void* data) {
    printf("start\n");
    while (1) {
	++testval;
	--testval;
	++testval;
	--testval;
	++testval;
	--testval;
	++testval;
	--testval;
	++testval;
	--testval;
    }
}


int main(int argc, char** argv) {
    SDL_Thread* t[5];

    SDL_Init(0);
    
    for (i = 0; i < 5; i++) {
	t[i] = SDL_CreateThread(inc_thread, NULL);
    }

    while (1) {
	printf("testval = %d\n", testval);
	SDL_Delay(1000);
    }

    SDL_Quit();
    return 0;
}

The assember output of inc_thread is:

Code:

inc_thread:
	pushl	%ebp
	movl	%esp, %ebp
	.p2align 4,,15
.L5:
	movl	testval, %eax
	incl	%eax
	movl	%eax, testval
	movl	testval, %eax
	decl	%eax

        *snip*

	jmp	.L5

It seems possible that the thread could be interrupted in the middle of an increment, but on my single-core AMD Athlon, testval is never greater than 5 or less than 0.

I will definitely protect it with a mutex.

tuxdev · 09-13-2006, 04:27 PM

What if you use "register" and/or "inline"?

jspenguin · 09-13-2006, 07:09 PM

"Register" won't work. If I have 18,000 objects, their refcounts can't all be in registers. The ref() method is inline, but that doesn't stop two threads from running it at the same time.

I did find a solution which works for x86 (and can probably be adapted to work with x86-64):

Code:

#if defined(__i386__) && defined(__GNUC__)
#define atomic_increment(var) __asm__ volatile ("lock incl %0" : "+m" (var))
#define atomic_decrement(var) __asm__ volatile ("lock decl %0" : "+m" (var))

#else
#warning "no atomic increment/decrement available, using slower mutex version instead"

#define atomic_increment(var) do {		\
    GC::lock();					\
    ++(var);					\
    GC::unlock();				\
} while (0)
#define atomic_decrement(var) do {		\
    GC::lock();					\
    --(var);					\
    GC::unlock();				\
} while (0)
#endif

The semantics for the lock prefix specify that no other processor can access the memory location until the instruction has completed.

paulsm4 · 09-13-2006, 07:43 PM

Hi, again -

You're absolutely dead-on correct, jspenguin.

Every CPU architecture I'm aware of has some kind of atomic "test and set" assembly instruction, and that's absolutely the highest-performing solution.

Good job!

randyding · 09-13-2006, 09:32 PM

In general its safe to modify a global integer variable for example in one (and only one) thread, and let all the other threads read it (and only read it) whenever they want. Although this is not a recommended practice, and it doesn't fit all possible uses.
This is how its done in interrupt service routines on embedded microcontrollers without an OS, don't have a mutex, and its not needed as long as the only one writing the variable is the ISR. Its a similar situation to threads in an OS.

There's one other problem with SMP that I've heard about but I'm not an expert on this, is you can have a system where a variable is inc. via one processor and it doesn't make it to the memory space of the other processor until some kind of "memory boundary" condition is met. I believe locking/unlocking a mutex does this in some way. Does anyone know if this is true.

sundialsvcs · 09-13-2006, 09:43 PM

The final word is ... "no."

These operators are not atomic. There are no guaranteed-atomic operators.

Use the synchronization primitives provided by the operating system. That's what they're there for. Don't try to get "cute." (Voice of painful experience.)

paulsm4 · 09-13-2006, 10:38 PM

Again: kudos, jspenguin!

You came up with what appears to me to be an excellent solution. If I ever had to implement my own low-level mutex (and sadly, there are often very good reasons for doing so, even in sophisticated OS's like Linux or Windows), I would be proud to have reached a similar solution.

Here are a few references that might explain further:

http://www.answers.com/topic/x86-assembly-language
ftp://download.intel.com/design/Pent...s/25366620.pdf
http://www.amd.com/us-en/assets/cont...docs/24594.pdf

Your .. PSM

PS:
As sundialsvcs correctly pointed out, *no* C/C++ operators - even increment/decrement - are guaranteed to be atomic, even on uniprocessors.

randyding · 09-15-2006, 10:37 PM

jspenguin that's cool stuff.
If you know how to do this on the MIPS instruction set, I'll write out the check right now!

paulsm4 · 09-18-2006, 12:22 AM

Randyding -

The key thing you're looking for on most architectures is an atomic "Test-and-Set" operation.

On the MIPS, this happens to be LL/SC.

There's a good example in the Linux file "atomic.h":

http://www.gelato.unsw.edu.au/lxr/so...-mips/atomic.h

See also:
http://www.go-ecs.com/mips/miptek1.htm
... and ...
http://www.ibiblio.org/pub/linux/doc...ocessing-HOWTO

Your .. PSM