need help to further *optimize* my small C program that produces absolute path
I need to further optimize this C program of mine that produces the absolute path of any path that is passed as an argument to it. All kinds of path are accepted (including but not only both absolute but indirect and relative paths). If the entered path ends with a slash ('/'), the output will also end in a slash or else it ends with the name of the last directory or "/.".
Here's the code. Code:
#include <stdio.h> Thanks in advance for any help. |
Why do you need to optimize it? If you are only converting 1 path it seems to me that this will be fast enough (as in, less than an eye blink). If you are converting many paths probably more time is being wasted in process startup than actual runtime of your code, I would suggest changing your program to convert multiple paths per run.
Quote:
Quote:
Quote:
Quote:
Code:
if (!hasNonDots && dotsCount == 2 && t > 0) { CHECK_TOKENS is #defined as: Code:
#ifdef CHECKMAXTOKENS Quote:
Quote:
|
The good question would be: "Why to optimize at all in this case ?". Since code deals with file paths, most likely it will be called between file operations, which are much slower than the code anyway.
|
Quote:
Regarding the conversion of multiple paths at once, it won't be applicable for my purpose since the program will be mostly used in a script that will require only one conversion per call. Now regarding about the speed, it's probably already fast enough but I just want it be in its fullest. Not really a reason perhaps. Just my taste. After all the shell script I'm creating is time critical and I feel that it's still slow. I was also thinking about the process startup that probably also includes loading of the code. I think it's really dependent on the kind of system and also, small code doesn't really make much speed gain than a larger code when it comes to loading since io functions mostly occur one and a time and a 512K code won't make much difference with a 1024K code. Quote:
It's not about additional memory usage but additional assignment statements. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
In any way, thanks again. |
Quote:
|
I have a few comments on how to make it faster I believe:
1) Dont optimize, and fix your code. It really is a mess, and optimizing by guessing usually has the reverse effect. Instead, make your algorithims better and stop worrying about squezing all the CPU out of it. 2) As a side note, using a switch statement usually produces less effiecent code for the CPU, using a switch statement in a loop while trying to make that loop faster is probably bad. 3) Stop making so many library calls. You seem to be making calls to do output left and right, when each call has a overhead. Instead, try to only make one output call when you know what to output, that should make things a bit faster. |
Quote:
Quote:
Quote:
|
A 50% time saving in part of the system that runs 2% of the time saves 1%; a 25% time saving in part of the system that runs 80% of the time saves 20%. Work smart, not hard! Identify the bottlenecks and optimise those; thg rest is of little consequence. Unless, of course, you are doing it for the pleasure in which case do what you like!
|
Quote:
Of course there's also a bit of pleasure or else it will all only be nothing but pain. Not all but don't most hackers do take pleasure with their programming? ^^, |
|
By 'library calls' I ment calls that went to the kernel, such as outputting text. String/array calls should be fine, and probably faster then a equivelent you can come up with.
As for your insistence upon speed, I gave your program a pass thru with 'time', and I see no real diffrence between your code and the command 'readlink -f'. Naturally, with such a small program doing such a simple job, itll probably take longer to start your program then to run it. I dont think optimizing this program is your best option for making things run faster, or that there is much you can do to optimize things as they are. |
Quote:
If your shell script feels slow, either switch to a better language (Perl/Python) or write your whole program in C/C++. And don't estimate modern compiler optimization strengths - it does really smart things, and it does them even when you use, say, array notation with indexes and pointer one - the former is safer overall. You are wasting your time optimizing this "C" code. |
Quote:
The load time will far exceed the execution time, so none of optimizations will have visible effect and if you wanted effective optimization you would focus entirely on reducing the size and the external linkage. But I think it is fair to ignore that. Significant parts of the code are only executed once, so their execution time is dominated by the cache misses of the first execution. So the speed of the instructions doesn't matter. The size matters and to a lessor extent the first pass branch mispredictions matter. Both can be improved by (among other things) doing some tail merges better by hand than the compiler can. Do you want to ignore the performance issues of "one time" code (any code that is executed so rarely that it is dominated by cache misses)? Lots of big programs have their total execution time dominated by such code (not actually one time code, but effectively one time code because it is always dropped from the cache before being executed again). A performance fanatic should learn how to optimize that code differently than inner loop code. For example Code:
memcpy(p, sourceArg, sourceArgSize); Quote:
ADD is bigger (but not inherently slower) than INC. Sometimes (such as with cache misses) bigger itself means slower. Sometimes it doesn't. The compiler rarely generates INC or DEC instructions, even if they would be better and even if you have distorted your code to bias toward them. Your several distortions toward INC each either make no difference or make the generated code worse. If you insist on controlling the asm code you have to write asm code. A=B+1; will always generate code at least as good as A=B;++A;. Usually they will be the same. Occasionally the second will be worse. If the compiler would be willing to use INC for the second, it will also use INC for the first. When you know X was zero (so you have this choice) ++X; in asm code will sometimes be faster than X=1; (less often than you think). But the C compiler will almost never let you achieve that optimization. In C code X=1; will usually be faster, including situations where ++X; would be faster in asm code. Quote:
Quote:
Quote:
Consider x=p-1; ... *x ...; If x is not used after that, the compiler will probably figure out you mean ... p[-1] ...; and generate exactly the same optimal code. If you use x later or involve a -- instead of the -1 you run a significant chance of confusing the compiler and generating worse code. Sometimes a DEC instruction would be faster than letting the compiler use the advanced addressing mode. I've tried to micro optimize that case. You can't win, short of writing the asm code yourself. Either you confuse the compiler enough that it doesn't use the advance addressing mode but does something else much worse than the DEC, or you fail to confuse it and it does the advanced addressing mode. But I don't think you would even recognize the case where the DEC would be better. It isn't the case you probably expect. Don't try to micro optimize unless you know more than the compiler writer knew. |
Wow there's nothing much I can say. Thank you very much for this great info. You've been frank but informative and that's what I really needed. It's kind of not easy for to be online this time but I'll save this page and think about all the things you said and decide what's best for my program for the next following days. I'll also consider the capabalities of modern compilers. I'll make a report once I'm finally done.
|
Quote:
Quote:
Clarity: no I don't actually know how compilers work. I've only the difference seen the outputs that varies depending on the optimization level used. :-). I'm no genius. :D |
All times are GMT -5. The time now is 03:38 AM. |