You stated the requirement unclearly, so I will try to restate it.
You have X versions of a chunk of data Y bytes long and at each position across Y you want to select the value based on maximum popularity across X.
The data structures are obviously simpler if you have an outer loop over Y and an inner loop over X. But for large enough X*Y that would cause cache performance issues or even run into virtual memory limits. You seem to want an outer loop over X (inner over Y) maybe for those good reasons or maybe because you didn't realize the opposite would be simpler.
If X is quite large, it may be efficient to accumulate across all possible values of the data type (256 possible values of byte) as you did in your sample code. But for small X, that is very inefficient. You gave 4 as an example of X.
Consider:
Code:
Loop over Y
Start a temp vector T
Loop over X
Append data[x][y] to T
Sort T (call some standard sort function)
Loop over T finding the longest run of matching values.
You also implied the data is not so badly corrupted that it cannot be reconstructed statistically. That implies a fairly high match rate. That implies a wide range of X where X is large enough that an associative container is better than the instance vector I suggested above, but X is not so large that the count vector you suggested is better than an associative container. Using an associative container is obviously easier in C++ than in C, but is possible in almost any language.
For very large Y, the performance vs. complexity tradeoff may make the choice of an associative container important.
If your 4x128 with 256 possible values was real rather than example, don't worry about any time, size, cache, or other performance issues. 128x256 is too small to care that it may be wildly inefficient.
Quote:
Originally Posted by GamezR2EZ
The issue is that I then have to go back and compare the vote array to find which byte as the most votes and recreate the 128 byte section based on those votes. I can't figure that code out.
|
Why is that hard? Outer loop over the 128 dimension, and inner loop over the 256 dimension to find the position of the largest number of those 256 numbers. I'm sure you understand the position where you found the largest count is the byte value to "reconstruct". I can't imagine it is hard to find the largest of 256 numbers.