LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-13-2011, 07:09 PM   #1
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Reference counting questions


I was wondering which is the better way to do reference counting in C.

The first idea is to have a macro that adds reference count and destructor function fields to a struct, and have a base struct for everything reference-counted.

The second is to make all structs normally with no reference counting in mind, and just use a reference-counting wrapper struct.

What do you think is better? Or maybe you even have some totally different ideas?
 
Old 03-13-2011, 08:24 PM   #2
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
You mentioned base structs and wrappers. Although they aren't concepts that are unique to a language with more object oriented features built in, such as C++, C++ and the like is where I'd typically expect to hear terms such as those used. So just to be absolutely sure, you are talking about C, not C++, yes?

Also, do you expect the reference counting to be used in a multi-threaded environment?

Last edited by rigor; 03-14-2011 at 02:55 PM.
 
Old 03-13-2011, 08:30 PM   #3
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by kakaka View Post
You mentioned base structs and wrappers. Although they aren't concepts that are unique to more language with more object oriented features built in, such as C++, C++ and the like is where I'd typically expect to hear terms such as those used. So just to be absolutely sure, you are talking about C, not C++, yes?
Yes, I am talking about C (not C++).

Quote:
Originally Posted by kakaka View Post
Also, do you expect the reference counting to be used in a multi-threaded environment?
Probably not.
 
Old 03-13-2011, 08:53 PM   #4
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
The first idea is to have a macro that adds reference count and destructor function fields to a struct
I understand reference count field in a struct in C (used to do that a lot before C++). But destructor function field?

Quote:
The second is to make all structs normally with no reference counting in mind, and just use a reference-counting wrapper struct.
In C++, the wrapper is more common, more object oriented, generally syntactically cleaner, but less efficient and has some semantic limitations compared to making the class participate in their own reference counting.

But in C, I don't see how the wrapper might be even syntactically cleaner. I prefer the more efficient version.

Quote:
What do you think is better? Or maybe you even have some totally different ideas?
Use C++ instead.

Why do you want to use C?
 
Old 03-14-2011, 05:49 PM   #5
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by MTK358 View Post
What do you think is better? Or maybe you even have some totally different ideas?
Don't use a macro, use a reference counting structure. To handle the reference counting itself, use helper functions which take a pointer to the reference counting structure. (The put function (decrease reference count) also needs a pointer to the actual object.)

I'd recommend embedding, since it is simpler.

I'd use GCC builtins for atomic access, so the helper functions are thread-safe without locking overhead.

Here is a simple C99 example with destructor function support, and a garbage collection chain. The total overhead is three words (a long and two pointers) per item. The destructor function gets two parameters: a pointer to the object, and a private per-item pointer.
Code:
struct garbage {
    struct garbage         *next;
    void                   *object;
};

struct destructor {
    int                   (*function)(void *, void *);
    void                   *payload;
};

struct reference {
    long                    count;
    union {
        struct garbage     *garbage;
        struct destructor  *destructor;
    };
};

/* Garbage chain */
struct garbage             *garbage = NULL;

/** get_reference() - Get a reference (increase reference count).
 * The function returns 0 if success, error code (errno.h) otherwise.
*/
int get_reference(struct reference *ref)
{
    if (!ref)
        return EINVAL; /* Invalid object */

    if (__sync_add_and_fetch(&(ref->count), 1L) <= 1L) {
        __sync_sub_and_fetch(&(ref->count), 1L);
        return ENOENT; /* Already gone */
    }

    return 0;
}

/** put_reference() - Release a reference (decrease reference count).
 * The function returns 0 if success, error code (errno.h) otherwise.
*/
int put_reference(struct reference *const ref, void *const object)
{
    int             result;
    struct garbage *temp;

    if (!ref || !object)
        return EINVAL; /* Invalid object */

    if (__sync_sub_and_fetch(&(ref->count), 1L) != 0L)
        return 0; /* Success */

    /* Mark object dead. Avoids races with 32768 concurrent threads. */
    if (__sync_sub_and_fetch(&(ref->count), 32768L) < -49152L)
        return 0; /* Not the first racing thread */

    /* Call destructor function */
    if (ref->destructor)
        result = ref->destructor->function(object, ref->destructor->payload);
    else
        result = 0;

    /* Move to garbage chain (atomically). */
    do {
        temp = __sync_or_and_fetch(&garbage, (struct garbage *)0L);
        ref->garbage->object = object;
        ref->garbage->next = temp;
    } while (!__sync_bool_compare_and_swap(&garbage, temp, ref->garbage));

    return result;
}

/** all_garbage() - Safely get the entire garbage chain
*/
struct reference *all_garbage(void)
{
    struct garbage *chain;

    do {
        chain = __sync_or_and_fetch(&garbage, (struct garbage *)0L);
    } while (!__sync_bool_compare_and_swap(&garbage, chain, NULL));

    return chain;
}
You can use the above functions for both structures containing an embedded reference count, and wrapped structures. Embedding is very simple:
Code:
struct embedded {
    struct reference   refcount;
    :
    my stuff
    :
};

struct embedded *object;

if (get_reference(&(object->refcount)))
    Failure, object is gone

if (put_reference(&(object->refcount), object))
    Error when releasing object
Wrapping is a bit more complex, because you can either put the reference counting structure before or after the object. The only thing that changes is the &(object->refcount) bit above; you need to replace it with a pointer or a macro that calculates the pointer to the reference counting structure, e.g.
Code:
#define UNWRAP_BEFORE(payload) ((struct reference *)( (char *)(payload) - sizeof(struct reference) ))

if (get_reference(UNWRAP_BEFORE(object))
    Failure, object is already gone

if (put_reference(UNWRAP_BEFORE(object), object))
    Error when releasing object
or
Code:
#define UNWRAP_AFTER(payload, size) ((struct reference *)( (char *)(payload) + (size_t)(size) ))

if (get_reference(UNWRAP_AFTER(object, sizeof(object)), object))
    Failure, object is already gone

if (put_reference(UNWRAP_AFTER(object, sizeof(object)), object))
    Error when releasing object
Hope this helps.
 
Old 03-15-2011, 04:05 PM   #6
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
@Nominal Animal: I don't really understand your code.

Anyway, this is what I came up with:

Code:
/** A base "class" for reference-counted objects.
 * To make your struct a "subclass" put OBJECT_TEMPLATE as the
 * very first element of your struct, call object_init on it in your
 * initialization function, and if you need to do some cleanup before the
 * struct is destroyed, set the member named "free" to a pointer to a function
 * that cleans up and then frees the struct itself.
 */

#ifndef __REFCOUNT_OBJECT_H_INCLUDE_GUARD__
#define __REFCOUNT_OBJECT_H_INCLUDE_GUARD__

#include <stdlib.h>

#define OBJECT_TEMPLATE \
	size_t refcount; \
	void (*free)(void*);

/** A struct to serve as an abstract base class for refcounted objects.
 * This is purely a convenience struct that any refcount object can be cast
 * to.
 */
typedef struct { OBJECT_TEMPLATE } object_t;

#define NEWREF(x) ((void*) object_new_ref((object_t*) x))
#define DELREF(x) object_del_ref((object_t*) x)

/** Initialize a refcount object.
 * This function sets the reference count to 1 and the free function to the
 * free() function in stdlib.h.
 * @param obj A pointer to the refcount object to initialize.
 */
void object_init(object_t *obj)
{
	obj->frecount = 1;
	obj->free = &free; // just use the free() from stdlib.h
}

/** Add a new reference.
 * @param obj A pointer to a refcount object.
 * @returns the same thing passed in the obj parameter, for convenience.
 */
inline object_t* object_new_ref(object_t* obj)
{
	obj->refcount++;
	return obj;
}

/** Delete a reference.
 * The object is automatically freed if the counter reaches 0.
 * @param obj The reference to be deleted.
 */
inline void object_del_ref(object_t* obj)
{
	obj->refcount--;
	if (obj->refcount <= 0)
		obj->free(obj);
}

#endif
The problem is that I'm really confused about where to use ADDREF and DELREF. It seems really wasteful to use ADDREF every time you pass a parameter to a function, so i thought that maybe it's unneeded unless you're going to store a reference on the heap. But what about this:

Code:
function_that_stores_reference(example_refcounted_object_new());
The new object will have a refcoutn of 1, the function will use ADDREF when it stores the reference, and use DELREF when it deletes it. The proglem is that after this happens, the counter will still have a value of 1!

Also the fact that some cases require you to use DELREF on the stack and some not, and that you should use ADDref when storing a reference and not use DELREF in a function that returns it but deletes it from storage is just hopelessly confusing for me.
 
Old 03-16-2011, 12:11 AM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by MTK358 View Post
@Nominal Animal: I don't really understand your code.
Well, your code is quite similar, except that I use GCC atomic builtins, so that mine is thread-safe.

Quote:
Originally Posted by MTK358 View Post
The problem is that I'm really confused about where to use ADDREF and DELREF. It seems really wasteful to use ADDREF every time you pass a parameter to a function, so i thought that maybe it's unneeded unless you're going to store a reference on the heap.
(Note: Your ADDREF/NEWREF == "get" and DELREF == "put" in most literature.)

When handling references internally, taking a reference makes sure the object does not vanish while it is still in use.

If you only use a single thread, or use a garbage chain instead of calling free() to immediately release the memory used by the object, you don't need to take a reference for internal handling. Then, the reference count tells you exactly how many non-temporary references there are to that object at any time.

Quote:
Originally Posted by MTK358 View Post
But what about this:

Code:
function_that_stores_reference(example_refcounted_object_new());
The new object will have a refcoutn of 1, the function will use ADDREF when it stores the reference, and use DELREF when it deletes it. The proglem is that after this happens, the counter will still have a value of 1!
No, that is exactly what should happen, since the correct code sequence is
Code:
object_t *temporary_object = example_refcounted_object_new();
function_that_stores_reference(temporary_object);
DELREF(temporary_object);
Note that example_refcounted_object_new() implies an ADDREF(), since it sets the reference count to 1.

Quote:
Originally Posted by MTK358 View Post
Also the fact that some cases require you to use DELREF on the stack and some not, and that you should use ADDref when storing a reference and not use DELREF in a function that returns it but deletes it from storage is just hopelessly confusing for me.
It shouldn't be confusing. Just remember that new() implies an ADDREF, and that ADDREF and DELREF should always be logically paired.
 
Old 03-24-2011, 04:17 PM   #8
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I came across a problem here. I'm using reference counting to keep track of when to free variables in my interpreter.

I've made it so that internal container objects don't use getref() when returning an element, since it might just be temporarily used and thrown away. This is fine, but what about this:

Code:
LangObject* Node::eval(Table *scope)
{
	switch (type) {
		
		case IntegerNode:
			return new IntegerObject();
		
		case MemberNode:
			if (childCount == 1) {
				Table *table = dynamic_cast<Table*>(children[0]->eval(scope));
				if (table)
					return table->get(payload.str);
				else
					return NULL; //TODO throw error
			}
			return scope->get(payload.str);

		<snip>

		default:
			break;
	}
	fprintf(stderr, "Unknown node type. Evaluating to null.");
	return NULL;
}
Note that when the type is MemberNode, you shouldn't putref() the returned value. But when the type is IntegerNode, then you have to call putref() since the refcount is 1 when you create an object. What should I do?
 
Old 03-24-2011, 05:38 PM   #9
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by MTK358 View Post
Note that when the type is MemberNode, you shouldn't putref() the returned value. But when the type is IntegerNode, then you have to call putref() since the refcount is 1 when you create an object. What should I do?
Right now you have complex polymorphic objects that are very difficult to account for accurately. I suspect the best solution is to look and think hard how you can make your data structures easier to handle.

(When I wrote my example program earlier, I kind of anticipated these kinds of problems. I've seen that objects with complex internal structure are very difficult to manage, especially reference count, accurately. That's why in my example program I separated the references and content into separate data structures. Think of my solution as attaching removable tracking and handling devices to an object, with inter-object references (member, sibling) always pointing to tracking and handling devices, not to the target objects directly. When both tracking devices and the objects themselves are separately reference-counted, and detachable, they are much easier to manage. In your case, the tracking and handling devices are fixed, built-in to the objects, and the objects have internal members also referred to.)

I understand how your solution works, but I cannot see how to solve these problems (and similar ones awaiting downstream) with these data structures you're using. I'm sorry I cannot help you better with this. I would like to, this is such an interesting area of programming, and you have a very refreshing can-do attitude.
 
Old 03-25-2011, 07:58 PM   #10
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
I guess the simplest way would be to make it so that Node::eval()'s return value always has to be putref()'d after it's used.

The problem is with things like this:

Code:
Function *func = dynamic_cast<Function*>(children[0]->eval(scope));
I can't think of any nice way to call putref() on eval()'s output here. To make things worse, dynamic_cast() simply throws away the reference if it's not the right type, so you can't call putref() any more.
 
Old 03-26-2011, 08:26 AM   #11
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
When did you switch this project from C to C++?

I think it was a good idea to switch from C to C++, but the change makes the thread a bit confusing.

Quote:
Originally Posted by MTK358 View Post
Code:
LangObject* Node::eval(Table *scope)
{
    switch (type) {
That kind of switch on the run time type of the current object totally violates the philosophy of object oriented programming.

You should have eval() as a virtual function, so each implementation would inherently know enough about its type to do its job.

Quote:
Note that when the type is MemberNode, you shouldn't putref() the returned value. But when the type is IntegerNode, then you have to call putref() since the refcount is 1 when you create an object. What should I do?
You need to decide on a consistent set of rules for which actions claim an ownership and which don't.

Quote:
I've made it so that internal container objects don't use getref() when returning an element, since it might just be temporarily used and thrown away.
Unfortunately consistency and efficiency often conflict. A consistent set of rules would typically lead to a design that, in the situation you just described, would increment the ref count when creating the temporary reference and decrement it again when throwing that away.

In that same situation myself, I usually push for higher efficiency at the expense of some clarity:
I create new objects with zero references, rather than with one reference.
I have many methods that return a non owning pointer to an object, so that the caller is responsible incrementing the ref count if it decides to take an ownership.
Some (not all) of the methods that return non owning pointers have the documented possibility of returning a pointer to an object with zero owners. One must be careful of such pointers, because passing that to another function that takes and releases an ownership would delete the object while the caller still holds the non owning pointer.
Also there is a method called when discarding a pointer to a possibly unowned object that deletes the object if it is unowned.

All those special cases need care in coding. On occasion the special case caused by the drive for extra efficiency makes a particular path through the code less efficient. But on average it means some (CPU) work skipped in one place leads to more (CPU) work skipped in its caller, so net more work for the programmer and less for the CPU.
Very often a function that gets a non owning pointer from function X to a possibly unowned object has as its last use of that pointer a call to a method that takes and later releases an ownership. The fact that callers of X so often would do that fed into the decision to have X return a non owning pointer. So X is saved the work of incrementing the count and its callers are saved the work of decrement with possible delete, and you don't even need to call the conditional delete of unowned object.

My estimate is that you are not experienced enough to follow that path, nor is your code likely to execute enough billions of inner loop iterations to even justify doing so. So I was explaining all the above to tell you enough about the nature of an intentionally inconsistent design that you would understand you should sacrifice some efficiency for a more consistent design.

Quote:
Originally Posted by MTK358 View Post
I can't think of any nice way to call putref() on eval()'s output here. To make things worse, dynamic_cast() simply throws away the reference if it's not the right type, so you can't call putref() any more.
If a method is able to return NULL instead of an owning pointer, then every caller needs to deal with that possibility. The convention must be that an ownership was taken (and must be released) if and only if the returned pointer is not null.

If an intermediate level takes a valid owning pointer and then returns either that pointer (dynamic cast to a different type) or NULL, that intermediate level must be responsible for fixing the ownership count when returning NULL.

If a significant number of places in the code must reduce the ownership on a pointer if and only if that pointer is non null, that behavior should be wrapped in a function. A global function is easiest in a small self contained project. A class static function in an appropriate class is better for code that might find its way into a larger project.

Last edited by johnsfine; 03-26-2011 at 08:42 AM.
 
Old 03-26-2011, 10:06 AM   #12
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
You need to decide on a consistent set of rules for which actions claim an ownership and which don't.
You mean that the result of some types of nodes should be putref()'s, and some shouldn't?

But how are you supposed to know which type of node you have? And it seems like if you are going to have an if statement to check whether you need to putref() the output whenever you call eval(), it seems like that can't be much more inefficient than calling getref() on things you don't really need to.

Quote:
In that same situation myself, I usually push for higher efficiency at the expense of some clarity:
I create new objects with zero references, rather than with one reference.
I thought about initializing objects to have zero references, but I didn't think it would solve the problem (putref() still needs to be called to make the object realize it has zero references).

Quote:
I have many methods that return a non owning pointer to an object, so that the caller is responsible incrementing the ref count if it decides to take an ownership.
All my methods are that way, except those that create objects, because I don't know of a better way. (By "ownership" you mean "store it in the heap and call getref() instead of using it once and throwing it away immediately", right?)
 
Old 03-26-2011, 02:46 PM   #13
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by MTK358 View Post
You mean that the result of some types of nodes should be putref()'s, and some shouldn't?
Hard to imagine how you jumped from what I said to that.

To clarify (but possibly also narrow) my meaning, some functions that return object pointers might returning owning pointers (pointers for which the called function has counted the caller's reference to the object) and other functions return non owning pointers.

Quote:
I thought about initializing objects to have zero references, but I didn't think it would solve the problem (putref() still needs to be called to make the object realize it has zero references).
We seem to be failing to communicate in both directions on that issue, so I'm not sure what to try to explain.
 
Old 03-26-2011, 03:53 PM   #14
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Quote:
Originally Posted by johnsfine View Post
Hard to imagine how you jumped from what I said to that.
Quote:
Originally Posted by johnsfine View Post
To clarify (but possibly also narrow) my meaning, some functions that return object pointers might returning owning pointers (pointers for which the called function has counted the caller's reference to the object) and other functions return non owning pointers.
I already had that idea and explained it, so I thought you must be talking about something else.

My idea was that no functions count the reference returned to the caller, except constructors and Node::eval() (since some nodes do nothing but call a constructor and return the new object).

Quote:
Originally Posted by johnsfine View Post
We seem to be failing to communicate in both directions on that issue, so I'm not sure what to try to explain.
I don't understand how making new objects have a refcount of 0 instead of 1 could make things easier and more efficient.
 
Old 03-30-2011, 12:24 PM   #15
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443

Original Poster
Blog Entries: 3

Rep: Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723Reputation: 723
Anyone?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: C/C++ reference counting with atomic variables and gcc LXer Syndicated Linux News 0 05-27-2009 02:11 PM
Extending Python with C (reference counting questions) spacepigeon Programming 0 07-19-2008 09:27 AM
tough (to me) c++ questions about pointer to class object reference and more parv Programming 6 01-11-2008 09:26 PM
questions about reference to "jiffies" carbooky Programming 4 04-12-2004 10:16 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:29 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration