Optimize simple C code
I want to count the number of each of the 256 possible bytes in a file.
This is the fastest code I have produced so far: Code:
// output character statistics Thank for any suggestions on how to optimize this further. |
What test cases are you using? For instance does your total counted always match the length of the input file?
I wouldn't use fread() for this - just read() or mmap(). |
I don't think you can accomplish anything optimizing the CPU speed of the actual operation. The operation, even including the full loop overhead if the compiler doesn't optimize that for you, is trivial compared to the memory accesses.
If BUFSIZ is not incredibly large then you shouldn't care whether you optimized. If BUFSIZ is incredibly large then allocating it on the stack is questionable and the run time will be dominated by the cache misses (mainly cache misses that occur inside your call to fread). The small potential for improvement here, assuming BUFSIZ is incredibly large, is to break into an outer loop over a decently chosen chunk size (amortize the overhead of fread with big chunks, but make them small enough that the cache misses that need to occur in fread don't occur a second time when you read from the buffer). I don't know about mmap of stdin. For an ordinary file, mmap might reduce the cache misses and some other processing time compared to fread. Plus it causes the kernel to effectively manage that chunk size for you, which is simpler and maybe better than doing so yourself. |
Quote:
I tried using read, and it has not sped anything up, it's just as fast. I have never used mmap, and from what I've read there usually is no significant improvement because the kernel handles it. Quote:
BUFSIZ is around 4096. I have also tried using pthreads, but the code ran slower than this one for some reason. Maybe I just messed up. I don't know how to use SSE or MMX, nor if they would help in this case. |
Quote:
If the whole task is much larger, I think a chunk size of 4096 is rather low and means the time is dominated by fread overhead. Quote:
Quote:
|
Quote:
There are 4 cores with each pair sharing 3 MB for a total of 6 MB for all 4 cores. I'll leave the thread up a bit, and if nobody has any magical optimizations, I think this is as optimized as it is going to get. Thanks for the input. |
I have one more question on this code. It works well, but if the input is not divisible by BUFSIZ, then some of the input is discarded. It isn't too much of an issue, but I'd like to know if can get the rest of the input without too much performance hit. Performance is more important than getting a few more bytes of input. This is read from stdin, so I can't calculate sizes beforehand and do it that way.
|
Quote:
I would not expect that to affect performance much, but you won't know until you try. With that change, it is trivial to know how much you read and process the correct amount. Code:
unsigned r; |
Thanks, it works !
I tried to do something with the return value of fread, but I didn't think to put it in the for loop. Alright, it is solved, and performance is the same. |
By the way, on unrelated note, instead of defining “NUMCHARMAX” use “UCHAR_MAX” from “limits.h” header file.
Back on topic, if you are really interested, you could play with “posix_fadvise”, for instance: Code:
#include <errno.h> |
@ mina86
I tried that code, but it is slower. You might be right that I should use UCHAR_MAX, but it is more convenient to use 256 instead of 255 + 1 in some calculations. I suppose I could define a constant based on UCHAR_MAX. |
Use at least 64K buffer size, the bigger the better. Up to 4MB can sometimes help.
|
Hi
I think It should be slightly faster to use fadvise, mmap and no buffer at all. Code:
#include <errno.h> |
It's still slower, 0.159s versus 0.131s for 100 MB file.
I have already tried large buffers, but the BUFSIZ defined in stdin.h seems to be optimal. Larger doesn't help, smaller is slower. |
Quote:
Quote:
Quote:
|
All times are GMT -5. The time now is 07:58 AM. |