Scheme: HPC Parallel Programming?

stateless · 08-16-2013, 12:32 PM

Hi. I'm interested in parallel programming (lots of parallel number crunching) with Scheme, on a Linux-based supercomputer. My thought was to start with simpler multithreaded programming to take advantage of many cores on a single node, and then afterwards look into expanding to taking advantage of multiple nodes.

But I'm having trouble figuring out which Scheme implementation / modules to use. I was excited about Gambit-C Scheme, because of great performances promises and an elaborate multi-threading infrastructure... but then I read in the documentation that all the threads run on one core only! Guile, I think, has some nifty functions for distributing calculations to multiple processes, but guile is just interpreted, so I'm not sure what kind of performance all that would mean.

I'm finding lots of old (20+ year old) research papers about possibilities, but no clear information about what is available now. I would appreciate any insight or guidance.

ta0kira · 08-17-2013, 09:56 AM

Quote:

Originally Posted by stateless

I was excited about Gambit-C Scheme, because of great performances promises and an elaborate multi-threading infrastructure... but then I read in the documentation that all the threads run on one core only!

Does that mean it emulates multithreading with some sort of userland threads? Otherwise, I'm not quite sure how a cross-platform implementation would always do that. Since it compiles to C, there has to be some way to access the pthread API, even if you just write an extension that you reuse.

Kevin Barry

bigearsbilly · 08-21-2013, 06:08 AM

Multithreading is not synonymous with parallel.

Multithreading is more trouble than it's worth.
You'll spend 90% of your effort working on it instead of the problem and get zero performance advantage (or less).
And then you won't be able to debug it.

And i can't see why you would use scheme for number crunching, use C if you want performance.

YankeePride13 · 08-21-2013, 09:42 AM

Quote:

Originally Posted by bigearsbilly

And i can't see why you would use scheme

This.

stateless · 08-21-2013, 11:38 AM

@ta0kira: That's the way I understood it. Basically, it is userland threads for concurrency. As far as accessing a pthread API - I don't know - I was hoping you all might know more. I was sort of hoping for a more "prefab" solution, rather than having to dive into the inner details of the language and write my own extensions and parallel programming infrastructure. But I'm open minded.

@bigearsbilly: I'm not quite sure I follow you. If I have access to a supercomputer node with 16 cores, I don't see how I am going to take advantage of that without multithreading. All the C programmers around here are using openmp - which is multithreading. Maybe you can clarify what you mean...

C certainly has a proven track record, but performance is not the only thing I'm interested in. Scheme is one of the functional languages, and there are a lot of intriguing and elegant ideas integrated into it. Besides, the Lisps were one of the pioneering families of HPC back in the early days.

ntubski · 08-21-2013, 02:51 PM

Bigloo has both POSIX threads and userspace ("fair") threads:

Quote:

Bigloo supports multithreaded programming. Two different libraries programming are available. The first one, the Fair Thread (see Section Fair Threads), enables, simple, easy to develop and to maintain code. The second one, the Posix Thread (see Section Posix Threads) enables more easily to take benefit of the actual parallelism that is now available on stock hardware. Because it is easier to program with fthread than with pthread, we strongly recommend to use the former as much as possible and leave the former for specially demanding applications. Both libraries are described in this chapter.
...
The Fair Threads library is ``Posix Threads'' safe, which means it is possible to use at the same time both libraries. In other words, it is possible to embed one fair scheduler into a Posix thread.

It seems like most schemes have userspace threads. Probably because userspace threads look a lot like continuations.

bigearsbilly · 08-22-2013, 03:45 AM

Quote:

Originally Posted by stateless

@bigearsbilly: I'm not quite sure I follow you. If I have access to a supercomputer node with 16 cores, I don't see how I am going to take advantage of that without multithreading. All the C programmers around here are using openmp - which is multithreading. Maybe you can clarify what you mean...

Do you have access to 16 cores? Cripes!
I guess when I have 16 cores I may try MT again, but still I am not sure what advantages 16 threads would have over 16 processes.
I just prefer simplicity myself after years of experience and trying to maintain complex code.

To be honest work is slack and I have no friends to talk to.

ta0kira · 08-22-2013, 10:19 AM

Quote:

Originally Posted by bigearsbilly

Multithreading is more trouble than it's worth.
You'll spend 90% of your effort working on it instead of the problem and get zero performance advantage (or less).
And then you won't be able to debug it.

Quote:

Originally Posted by bigearsbilly

Do you have access to 16 cores? Cripes!
I guess when I have 16 cores I may try MT again, but still I am not sure what advantages 16 threads would have over 16 processes.
I just prefer simplicity myself after years of experience and trying to maintain complex code.

These statements are way too broad. Mathematical programming is quite a bit different than other types of programming. When you're performing scientific computations you can't get any simpler than the computation you're performing, and in a lot of cases the fidelity of your computation is a direct result of how much you can compute in a given timeframe. Some things can be split into separate processes (e.g. running the same simulation 1 million times,) but others shouldn't be (e.g. multiplying two large matrices.) It really depends on the nature of the computation and what information needs to be shared among parallel components.

I do agree that if you don't properly account for I/O in your threading decisions you can end up with substantially-worse performance when trying to parallelize. In general, though, wise decisions when parallelizing can give you impressive increases in performance, even if those increases aren't linear with respect to the number of threads/processes.

Kevin Barry