[SOLVED] [C++] Reference counting, jumps, and exceptions

MTK358 · 04-27-2011, 03:58 PM

I am writing a simple programming language interpreter, and it makes very heavy use of reference counting to track resoruces. I have a class called RefcountObject that has a reference count field and two methods: getref() (which adds to the count) and putref() (which subtracts from the count). putref() also checks of the count reached zero. If so, it deletes itself.

I would like to implement exceptions in the interpreted language, and that should be simple to do with either longjmp() or C++'s exception handling system. The problem is that if you jump from one point in the code to another, you abandon all these pointers without calling putref() on them.

Is there any good solution?

SigTerm · 04-27-2011, 04:22 PM

Quote:

Originally Posted by MTK358

I am writing a simple programming language interpreter, and it makes very heavy use of reference counting to track resoruces. I have a class called RefcountObject that has a reference count field and two methods: getref() (which adds to the count) and putref() (which subtracts from the count). putref() also checks of the count reached zero. If so, it deletes itself.

I'd advise to rename methods got something like addRef/decRef, because get/put does not represent what function does. "put" is normally used to assign value to an internal field, not increase it, and "get" is used to get a value without changing it.

Quote:

Originally Posted by MTK358

I would like to implement exceptions in the interpreted language, and that should be simple to do with either longjmp() or C++'s exception handling system. The problem is that if you jump from one point in the code to another, you abandon all these pointers without calling putref() on them.

Is there any good solution?

The solution is not to use "jumps".
(A C++ approach) When exception is thrown, call destructor for all local variables, then repeat the same process within the caller, then within caller's caller, etc. Keep doing that until you reach top level of execution, or until exception is handled. For unhandled exceptions terminate the program.
Instead of using a "pointer a RefCounter", use a "smart pointer to RefCounter" which calls refcounter's "decRef" method when the pointer is destroyed. This way RefCounter will be deleted even if exception has been thrown.
Since it is your language, then you don't have to introduce classes and destructors. Make a special refcounting pointer type that calls addRef/decRef when necessarry (addRef when assigned a new value, decRef, when it goes out of scope or gets destroyed).

Or you could try to implement garbage collector.

jcmlq · 04-27-2011, 04:50 PM

Don't use setjmp/longjmp, it's really nothing more than a goto that can keep track of where you came from.

Try/Catch allows you to pass around context and *unwinds* the stack all the way back to the open of the try block. That means that each local variable in each scope will have it's destructor called.

I'd suggest you use auto_ptr for the do it yourself approach, or shared_ptr if using boost is an option.

David1357 · 04-27-2011, 05:28 PM

Quote:

Originally Posted by jcmlq

...shared_ptr if using boost is an option.

You can use std::tr1::shared_ptr if you have a new enough g++:

Code:

[user@machine:~]:g++ --version
g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[user@machine:~]:find /usr/include -name '*shared_ptr*'
/usr/include/c++/4.4/tr1/shared_ptr.h
/usr/include/c++/4.4/bits/shared_ptr.h

MTK358 · 04-27-2011, 07:18 PM

Quote:

Originally Posted by SigTerm

I'd advise to rename methods got something like addRef/decRef, because get/put does not represent what function does. "put" is normally used to assign value to an internal field, not increase it, and "get" is used to get a value without changing it.

I thought that another member (I think it was "Nominal Animal") said that the proper terms are "get" and "put". That's why I named them that way.

Quote:

(A C++ approach) When exception is thrown, call destructor for all local variables, then repeat the same process within the caller, then within caller's caller, etc. Keep doing that until you reach top level of execution, or until exception is handled. For unhandled exceptions terminate the program.

That might work, but it seems kind of tedious to have try/catch blocks around every call to a child node.

Code:

Instead of using a "pointer a RefCounter", use a "smart pointer to RefCounter" which calls refcounter's "decRef" method when the pointer is destroyed. This way RefCounter will be deleted even if exception has been thrown.

Maybe, but will it decrease performance?

Quote:

Since it is your language, then you don't have to introduce classes and destructors.

I don't get that part. There will be no memory management in the interpreted program, the C++ interpreter takes care of that. And the interpreted language doesn't have classes.

SigTerm · 04-27-2011, 09:32 PM

Quote:

Originally Posted by MTK358

I thought that another member (I think it was "Nominal Animal") said that the proper terms are "get" and "put". That's why I named them that way.

Typically "get" is a const method returns value without changing it, and "put"|"set" assigns new value. It is matter of taste, though. See Qt 4 documentation for examples.

Quote:

Originally Posted by MTK358

That might work, but it seems kind of tedious to have try/catch blocks around every call to a child node.

Exception is a "severe" situation, so I'd try to insert one "try/catch" block at the top level, instead of putting them around every call.

Quote:

Originally Posted by MTK358

I don't get that part. There will be no memory management in the interpreted program, the C++ interpreter takes care of that. And the interpreted language doesn't have classes.

I thought you were talking about implementing exception mechanism in interpreted language you're developing. It looks like I was mistaken.

If you have a C++ RefCounter class, and want to release it even in case of C++ exception, then you'll need to use modified smart pointer class and forget about jumps. I'd recommend to look at QSharedPointer for example. If you don't NEED to know number of references, I'd recommend to try to replace it with any smart pointer class (boost::shared_ptr, for example). There are multiple ways to approach the problem.

Quote:

Originally Posted by MTK358

Maybe, but will it decrease performance?

Instead of trying to guess the performance, you need to measure it with profiler. Linux has gprof for that.
I'd advise to keep code readable and deal with performance problems only when you run into them. From my experience, the slowest operations are memory allocation/deallocation (new/delete). Reference counter will add few assembly instructions, but it'll save a lot of coding time.

MTK358 · 04-28-2011, 08:44 AM

Quote:

Originally Posted by SigTerm

I thought you were talking about implementing exception mechanism in interpreted language you're developing. It looks like I was mistaken.

Acually, implementing exceptions in the interpreted language is the entire reason for this thread, by adding Throw and Catch AST nodes. I got very confused by this sentence you wrote, and I still don't understand what you meant by it:

Quote:

Since it is your language, then you don't have to introduce classes and destructors.

The way I interpreted it, I tought it meant that the interpreted language had classes and that I shouldn't need to add the concepts or constructors and destructors to the interpreted language. But the interpreted language doesn't have the concept of classes in the first place!

If that's not what you meant, could you explain what you meant there?

SigTerm · 04-28-2011, 10:10 AM

Quote:

Originally Posted by MTK358

If that's not what you meant, could you explain what you meant there?

I meant to say that when exception is thrown within your language, walk through every local variable and call a cleanup procedure, repeat same process in caller, and keep doing that until exception is handled.

A cleanup procedure for variable is technically a destructor, which is hidden within interpreter's implementation and is completely invisible from within the program being executed by interpreter.
Language itself may not have a concept of "destructors" or "classes".
In other words, if every int variable within interpreter calls "decRef" when it goes out of scope, this functionality is a destructor. However, a program being interpreted will not know about existence of such destructor.

MTK358 · 04-30-2011, 12:46 PM

I'm starting to think that putting try/catch around everything is not a very good idea, because I'll need so much of them (and I haven't thoguht of the parser — when there's a syntax error, an exception is thrown, and all the AST nodes that were created get abandoned). I guess the only other options are smart pointers and garbage collecting.

The problem with smart pointers is that I can't use the normal syntax of defining pointers (so i'll have to modify all functions and classes). Also, it seems like cpoying an object and turning all pointer dereferences to method calls would be bad for performance. The good thing is that it will flawlessly count references without mistakes without adding code to add/remove references.

The problem with garbage collection (I mean something like this: http://www.hpl.hp.com/personal/Hans_Boehm/gc/) is that I heard that it can perform poorly, and cause programs to hang occasionally for a few moments on slower computers. Also, I'm not sure if it's very portable. And finally, maybe I'm wrong, but it feels like cheating. The good thing is that I can completely forget about memory management, and don't have to modify my code (except for add/remove calls and free/delete calls). Another good thing is that the website actually mentions it being used in some programming language implementations.

MTK358 · 04-30-2011, 08:28 PM

After doing more research I decited to use the garbage collector.

I have another question, but I think that it would be more appropriate to start a new thread.

Nominal Animal · 05-03-2011, 08:29 PM

Quote:

Originally Posted by SigTerm

Typically "get" is a const method returns value without changing it, and "put"|"set" assigns new value. It is matter of taste, though. See Qt 4 documentation for examples.

Only in C++ and other object-oriented languages. There are others, you know.

For reference counting, the correct terminology is "get a reference to" and "put a reference to".

It is clearer to use "increment the reference count" and "decrement the reference count", but that will imply the argument is a reference count, not the counted object. Making that clear (for example, using inc_refcount_of() and dec_refcount_of() makes the function or method names quite long. (FWIW, I do prefer the latter two.)

"Add" is especially bad choice, since how do you "add" a reference? Does it count?

In places like the Linux kernel get/put (with respect to references) is used extensively. For example, check the first and fifth questions in the kernel module init tools FAQ. Most C books that have a chapter on reference counting I've glanced at also use get/put terminology almost exclusively.