This is probably too specialized a question for this forum (not doubting skills or intelligences.) I have a somewhat strange algorithm that uses structures with about 4 layers of allocation. Because of extensive allocations, I've been trying to use a memory-pool allocator.
Here is my best representation of the data structure:
- data: some string-like data type
- data_set: some object containing 2 datas
- data_list: a vector of data_set*, which are dynamically-allocated so sorting, etc. are fast because only pointers are moved
- data_matrix: a vector of data_list*, which are allocated just as above
This explanation is actually a whole lot simpler than reality.
data_list uses the allocator for the array of
data_set* and also for the individual
data_set. The same goes for
data_matrix. That means the individual
data_set are 4 allocations away from
data_matrix.
The application currently uses a
data_matrix with about 26k elements, each having about 45 elements (~1.1M
data_set.) Comparing
valgrind outputs of the actual data processing, using
boost::pool_allocator saves about 9M calls to
malloc (of about 22M,) but it saves absolutely no time and in fact causes destruction to go from a few seconds with
std::allocator to a few minutes.
The performance I get isn't terrible with
std::allocator, but this is for a relatively simple algorithm compared to those to come. I find it difficult to believe that
malloc can out-perform a memory pool.
Kevin Barry
PS I haven't actually tried writing my own memory pool for this particular application, but I'm wondering if something within
boost::pool_allocator that's supposed to add a feature (like it's automatic cleanup) is actually slowing it down. I also wonder if it might just give up and
mmap instead of toughing it out; a single run takes up about 750M, which all fits into real memory.