Optimizing the spacing of a mesh containing a given set of points

Feynman · 01-15-2011, 09:44 PM

I tried to summarize the this as best as possible in the title. I am writing an initial value problem solver in the most general way possible. I start with an arbitrary number of initial values at arbitrary locations (inside a boundary.) The first part of my program creates a mesh/grid (I am not sure which is the correct nuance), with N points total, that contains all the initial values. My goal is to optimize the mesh such that the spacing is as uniform as possible. My solver seems to work half decently (it needs some more obscure debugging that is not relevant here.)

I am starting with one dimension. I intend to generalize the algorithm to an arbitrary number of dimensions once I get it working consistently. I am writing my code in fortran, but feel free to reply with pseudocode or the language of your choice.

Allow me to elaborate with an example:
Say I am working on a closed interval [1,10]
xmin=1
xmax=10

Say I have 3 initial points: xmin, 5 and xmax
num_ivc=3
known(num_ivc)=[xmin,5,xmax] //my arrays start at 1. Assume "known" starts sorted

I store my mesh/grid points in an array called coord. Say I want 10 points total in my mesh/grid.
N=10
coord(10)

Remember, all this is arbitrary--except the variable names of course. The algorithm should set coord to {1,2,3,4,5,6,7,8,9,10}

Now for a less trivial example:
num_ivc=3
known(num_ivc)=[xmin,5.5,xmax
or just
num_ivc=1
known(num_ivc)=[5.5]

Now, would you have 5 evenly spaced points on the interval [1, 5.5] and 5 evenly spaced points on the interval (5.5, 10]? But there is more space between 1 and 5.5 than between 5.5 and 10. So would you have 6 points on [1, 5.5] followed by 4 on (5.5 to 10]. The key is to minimize the difference in spacing.

I have been working on this for 2 days straight and I can assure you it is a lot trickier than it sounds. I have written code that
only works if N is large
only works if N is small
only works if it the known points are close together
only works if it the known points are far apart
only works if at least one of the known points is near a boundary
only works if none of the known points are near a boundary

So as you can see, I have coded the gamut of almost-solutions. I cannot figure out a way to get it to perform equally well in all possible scenarios (that is, create the optimum spacing.)

Nominal Animal · 01-16-2011, 02:56 AM

Well, to really space you out, here's an alternative approach.

Think about the volumes (let's call them cells) defined by your samping points. Assume each point in the space belongs to the cell owned by the closest sampling point. You can use squared vector norm, sum(coord[i], i=1..n) in the distance comparison, for any number of dimensions.

Then, you can easily use an iterative method to move the sampling points, in small increments, into a configuration where the volumes are uniform.

In a single dimension, this will always result in a more or less even distribution from minimum to maximum; the interesting stuff happens in higher dimensional situations.

For example, you could do this like this:

Compute the volume of each cell.
You can do this by simply randomly sampling the entire n-volume, and noting which cell was closest to the random point.
The volume of each cell is then approximately the fraction of hits hitting that cell, of the total volume.
Compute which cells are connected.
You can do this in many ways, for example noting the contenders while volume sampling, or doing n-dimensional Delaunay triangulation on the sampling points.
For each pair of connected cells, compute a weight factor based on the ratio of the cell volumes (smaller to larger, thus 0..1)
Use slice samping to pick a random pair or cells, weighted by the the weight factors.
You do this by generating a random number between zero and the total sum of weight factors (not including the upper limit), and then walk the pair list until the cumulative sum exceeds the random number. Where that happens, you pick that pair.
Move one or both of the sampling points (along the axis between the two), in such a way that the weight factor should increase.
Note that the average cell volume is the total volume divided by the number of cells; that helps to make the right decision.
Restart from the beginning, until all weight factors are above a preset smoothness limit.

This method can get stuck in a local minimum, so you could use e.g. a genetic algorithm, evolving multiple systems in parallel, and killing off those that do not evolve.

This algorithm should adapt well to different starting conditions. Random sampling is most efficient volume estimation for five or more dimensions, but it's not terrible for the two to four-dimensional cases either. You must find a good strategy on how to move the sampling points in a way that never worsens the situation, but that shouldn't be too difficult.

If you have a very large system, you don't need to recompute the entire system at every step, the affected region is enough. If you know the longest distance R between the sampling points in a cell pair, you can use that as a safe yardstick of the maximum cell size: spheres of radius 2R centered at the modified sampling points (three or four spheres, both old and moved sampling points) will contain all affected cells completely, I believe; the pair list will tell the cells connected to the modified cells.

Does any of this make any sense to you?

Nominal Animal

Feynman · 01-16-2011, 05:03 PM

Ok, although I have always wanted to use a code up a neural network in (it's been on my todo list for a while), this seems like a rather large undertaking. It just seems like there should be an algebraic solution rather than having to resort to an adaptive algorithm. I had tried to come up with an algorithm that uses the ratio of the length (or volume if you want to jump to n dimensions) of an interval between two given points to the length of the total interval (xmax-xmin) to determine the number of points that should be put in between, but I could not get it to work consistently.

PS I lost you at the fourth bullet. I have never actually coded a neural network because I never fully understood how it works. I remember it having to do with adaptively adjusting weight factors.

Nominal Animal · 01-17-2011, 12:11 AM

It's not a neural network, just adaptive iteration, closely related to Monte Carlo methods.

The idea is that if you modify the boundary between two sampling regions in such a way that the difference in their volumes decreases. The difference in volumes behaves like entropy, and your modifications always decrease it (increasing order). If not, the method is stuck, and you have a result.

All the rest of the algorithm is just to make that efficient. Picking a random boundary -- a pair of regions -- is important, so that you cover all boundaries. Picking a boundary with a large difference in the volumes gives you a high possibility of decreasing entropy.

Each region is of course the volume which is covered by a single sampling point; a single sample cell.

If you just used this algorithm without modifications, you'd end up with a set of sampling points about evenly distributed in your sampling volume.

The trick is that you can add boundary conditions; for example, that each of the volumes must contain a specific point. This will decrease the efficiency of the algorithm a bit, since you'll have to discard those changes that violate the condition. On the other hand, you can use that as a yardstick on the changes you can make: at least some changes should violate the condition, or you're not doing drastic enough changes.

Even better, you can apply weighing to the sampling volumes. If you measure the volume using random sampling, you can use any continuous positive spatial distribution function to weigh the volumes; getting a sort of importance sampling.

As to the fourth bullet, consider this: Say you have four things, A, B, C, and D, that have probabilities a, b, c and d. Let L=a+b+c+d, the total sum of probabilities. Generate random number i, 0 <= i < L. Then, if i < a, pick A. if i < a+b, pick B. if i < a+b+c, pick C. Otherwise pick D, since i < a+b+c+d = L. This is slice sampling. The fourth bullet just means that this is a good way to choose the boundary. A, B, C, and D are the boundaries (or sampling cell pairs), and a, b, c, d are their weights. The weight being the ratio of the smaller volume to the larger volume in the pair. If you sample enough times, you'll sample all of them, but you'll sample the high weight ones more.

As you might have gathered, I do like bouncing off ideas such as these. I hope I didn't waste too much of your time

Nominal Animal