Methodology of creating bindings for toolkits written in C++

Sergei Steshenko · 05-26-2012, 03:22 PM

Hello,

I am wondering how bindings for toolkits written in C++ are created.

Let's consider a "high" level language like Perl/Python/OCaml - AFAIR they are written in "C".

Let's consider a couple of toolkits written in C++, say, Qt and Armadillo ( http://arma.sourceforge.net/ ). One has to pay attention that the latter is (mostly ?) C++ templates.

I think I understand how conceptually bindings are written for "simple" (no templates at public interface level) toolkits. A set of "C" wrappers that, first of all, call the C++ toolkit constructors (through "placement new") and call C++ toolkit destructors.

Also, a set of marshallers is needed to interface between the two entities ("high" level language <-> C++ toolkit) is needed.

This set of constructors + destructors can be "packaged" into a (dynamic) library and the library can be linked with the "high" level language and through the "C" wrappers constructors can be called (i.e. objects can be created) and object methods can then be called - they apparently also need "C" wrappers to deal with 'this' issue - 'this' is not known to "C".

So far so good.

Now, the difficult (just to me ?) part - templates. If I understand correctly, if we have C++ stuff to be instantiated through template specialization, we need to include the C++ file and to compile it with the code which is to use the template stuff.

This inclusion + compilation is trivial in C++ world, but not (just to me ?) trivial in the "high" level language written in "C" world.

I am thinking of something along the JIT/late binding lines. I.e. a set of constructs for the "high" level language is created. The constructs will probably internally look like

Code:

compile_cxx(template_class_name, template_param_1, ..., template_param_N, ...)

where template_class_name, template_param_1, ..., template_param_N are strings.

Then, when such a construct (in whatever syntactic disguise) is met in "high" level language source, the necessary C++ files are "pulled", the template_class_name template with template_param_1, ..., template_param_N is instantiated, and according to some naming convention a "C" wrapper is created, then that "C" wrapper can be linked to the "high" level language.

Which means yet another pass to parse the "high" level language source is apparently necessary.

The above is a very rough picture, I just want to understand whether I'm thinking in the right direction and whether I'm missing something big.

Sergei Steshenko · 05-28-2012, 12:00 PM

I think somebody should have already pointed my attention to the fact that I might need to implement custom 'new' and 'delete'.

Lets have a look a the following stupid C++ code of mine which nevertheless cleanly compiles:

Code:

class Point
  {
  private:
    double x, y;

  public:
    Point()
      {
      x = y = 0.0;
      }

    Point(double x1, double y1)
      {
      x = x1;
      y = y1;
      }
 
  // accessrors need to be added
  };

class PointOnHeap
  {
  private:
    Point *point;

  public:
    PointOnHeap(double x, double y)
      {
      point = new Point(x, y);
      }

    ~PointOnHeap()
      {
      delete point;
      }

  // accessrors need to be added
  };

.

Suppose I want to make bindings for PointOnHeap.

A wrapper using 'placement new' will eventually call PointOnHeap constructor, so memory will be allocated twice - first in the "high" level language, and then through 'new' in the constructor.

Likewise, calling a wrapper for ~PointOnHeap will cause 'delete' to be called, and then the wrapper should itself deallocate or mark for deallocation memory allcoated by PointOnHeap wrapper.

It appears there will be no memory leak (provided the destructor indeed calls 'delete'), but having memory allocated twice doesn't look good.

dugan · 05-28-2012, 12:10 PM

There are programs that generate C++ bindings for (at least) Python and Lua. I'm curious as to whether you've looked at them.

Here's an overview of the ones available for Python:

http://stackoverflow.com/questions/1...ary-comparison

Sergei Steshenko · 05-28-2012, 03:23 PM

Quote:

Originally Posted by dugan

There are programs that generate C++ bindings for (at least) Python and Lua. I'm curious as to whether you've looked at them.

Here's an overview of the ones available for Python:

http://stackoverflow.com/questions/1...ary-comparison

I know there are. And I used http://swig.org/ in the late nineties. However, I do not remember reading anything about making bindings for C++ template classes - this is my main interest.

The fact Boost::Python heavily uses C++ templates does not imply it's easy to instantiate a C++ template class from Python code.

Again, my main conceptual question is about JIT for C++ template classes to be used in "high" level language code; this JIT may be slow from end user's point of view, maybe some caching can be used in case of repetitive compilations.

ta0kira · 05-29-2012, 09:48 AM

Templates are compile-time functionality; they cannot be instantiated at run time. Languages like Python have a single C type that's used for all objects, however, so you could just create a wrapper class for that type in C++ to be used as the template argument. You won't be able to take advantage of partial specialization unless you explicitly provide bindings for all of the specializations.
Kevin Barry

Sergei Steshenko · 05-29-2012, 09:54 AM

Quote:

Originally Posted by ta0kira

Templates are compile-time functionality; they cannot be instantiated at run time. ...

The statement is somewhat debatable. Let's forget about the "high" level languages for a time being and think about C/C++.

In C/C++ I can have a piece of code, that causes the following chain of events:

1) a template or anything else requiring compilation is encountered;
2) C/C++ compiler and linker are called;
3) the linker creates a DLL;
4) the DLL is loaded by the main program;
5) functions in the DLL are called.

That's what I meant as JIT in this thread.

ta0kira · 05-29-2012, 11:48 AM

Quote:

Originally Posted by Sergei Steshenko

The statement is somewhat debatable. Let's forget about the "high" level languages for a time being and think about C/C++.

Template instantiation occurs during compilation. Object instantiation occurs at run time. They aren't the same thing. The former creates a class, whereas the latter creates an object corresponding to a class. For example, given a template template <class T> myclass, if myclass <int> isn't anywhere to be found in your code then you won't be able to use myclass <int> from your DLL. Further, unless myclass is populated by virtual functions, you'll only have access to the class functions that are used for the given instantiation. E.g. if myclass has a member func, you won't be able to use myfunc <int> ::func from your DLL unless you reference func for the instantiation of myfunc <int> in the source that's used to create the DLL.
Kevin Barry

Sergei Steshenko · 05-29-2012, 11:56 AM

Quote:

Originally Posted by ta0kira

Template instantiation occurs during compilation. Object instantiation occurs at run time. They aren't the same thing. The former creates a class, whereas the latter creates an object corresponding to a class. For example, given a template template <class T> myclass, if myclass <int> isn't anywhere to be found in your code then you won't be able to use myclass <int> from your DLL. Further, unless myclass is populated by virtual functions, you'll only have access to the class functions that are used for the given instantiation. E.g. if myclass has a member func, you won't be able to use myfunc <int> ::func from your DLL unless you reference func for the instantiation of myfunc <int> in the source that's used to create the DLL.
Kevin Barry

I am still going to write a "C" wrapper that eventually calls the constructor. I.e. there will be a separate "C" wrapper per full template specialization.

Calling the "C" wrapper is object instantiation. To be more precise, it's actually preallocating memory for the object and running constructor with class data in that preallocated memory.

So, I don't understand how you objections invalidate my approach.

ta0kira · 05-30-2012, 12:30 AM

Quote:

Originally Posted by Sergei Steshenko

So, I don't understand how you objections invalidate my approach.

That's a conflict you've invented.

Creating bindings for every template specialization will instantiate all of those specializations, so your problem is apparently solved. Additionally, my previous comments suggest that the way you've chosen is the only way, other than wrapping the C/C++ object that represents any object in the target language (e.g. PyObject*) with a C++ class and then providing bindings for only that instantiation of the template.
Kevin Barry