looking for a fast STL-compliant memory-pool allocator

ta0kira · 03-25-2009, 10:53 PM

This is probably too specialized a question for this forum (not doubting skills or intelligences.) I have a somewhat strange algorithm that uses structures with about 4 layers of allocation. Because of extensive allocations, I've been trying to use a memory-pool allocator.

Here is my best representation of the data structure:

data: some string-like data type

data_set: some object containing 2 datas

data_list: a vector of data_set*, which are dynamically-allocated so sorting, etc. are fast because only pointers are moved

data_matrix: a vector of data_list*, which are allocated just as above

This explanation is actually a whole lot simpler than reality. data_list uses the allocator for the array of data_set* and also for the individual data_set. The same goes for data_matrix. That means the individual data_set are 4 allocations away from data_matrix.

The application currently uses a data_matrix with about 26k elements, each having about 45 elements (~1.1M data_set.) Comparing valgrind outputs of the actual data processing, using boost::pool_allocator saves about 9M calls to malloc (of about 22M,) but it saves absolutely no time and in fact causes destruction to go from a few seconds with std::allocator to a few minutes.

The performance I get isn't terrible with std::allocator, but this is for a relatively simple algorithm compared to those to come. I find it difficult to believe that malloc can out-perform a memory pool.
Kevin Barry

PS I haven't actually tried writing my own memory pool for this particular application, but I'm wondering if something within boost::pool_allocator that's supposed to add a feature (like it's automatic cleanup) is actually slowing it down. I also wonder if it might just give up and mmap instead of toughing it out; a single run takes up about 750M, which all fits into real memory.

johnsfine · 03-26-2009, 04:02 PM

Quote:

Originally Posted by ta0kira

data: some string-like data type

That makes me guess the important part is variable length. Is that the case? The mix of sizes matter a lot to a fast allocation system.

Quote:

Comparing valgrind outputs of the actual data processing, using boost::pool_allocator saves about 9M calls to malloc (of about 22M,) but it saves absolutely no time and in fact causes destruction to go from a few seconds with std::allocator to a few minutes.

OProfile is a better tool for this type of question. You shouldn't care so much about calls. You should just care about what is taking the time. OProfile will measure that more accurately.

Quote:

The performance I get isn't terrible with std::allocator, but this is for a relatively simple algorithm compared to those to come. I find it difficult to believe that malloc can out-perform a memory pool.

What did you expect to accomplish with pool allocation?

As a generic memory allocator, the standard GNU malloc is very good (ptmalloc3 is a little better).

If you want to do better than that, simply using a pool isn't a real step. There has to be some inherent advantage to the pool and you have to use it in a way that achieves that potential benefit.

If everything in the pool gets destroyed at once and then the program goes on to do something else, that behavior may allow you to capture a big savings (with significant work).

If there is cache or VM locality reasons to keep things in the pool close together and the program also has a lot of objects outside the pool, a pool might achieve significant savings without much extra work.

If you have a lot of objects of a specific size, a pool might realize decent savings (with modest extra work).

There are many other situations in which a pool can help. But you need to know which benefit of a pool you are after and specifically code for it.