need help to further *optimize* my small C program that produces absolute path
I need to further optimize this C program of mine that produces the absolute path of any path that is passed as an argument to it. All kinds of path are accepted (including but not only both absolute but indirect and relative paths). If the entered path ends with a slash ('/'), the output will also end in a slash or else it ends with the name of the last directory or "/.".
Here's the code. Code:
#include <stdio.h> Thanks in advance for any help. |
Why do you need to optimize it? If you are only converting 1 path it seems to me that this will be fast enough (as in, less than an eye blink). If you are converting many paths probably more time is being wasted in process startup than actual runtime of your code, I would suggest changing your program to convert multiple paths per run.
Quote:
Quote:
Quote:
Quote:
Code:
if (!hasNonDots && dotsCount == 2 && t > 0) { CHECK_TOKENS is #defined as: Code:
#ifdef CHECKMAXTOKENS Quote:
Quote:
|
The good question would be: "Why to optimize at all in this case ?". Since code deals with file paths, most likely it will be called between file operations, which are much slower than the code anyway.
|
Quote:
Regarding the conversion of multiple paths at once, it won't be applicable for my purpose since the program will be mostly used in a script that will require only one conversion per call. Now regarding about the speed, it's probably already fast enough but I just want it be in its fullest. Not really a reason perhaps. Just my taste. After all the shell script I'm creating is time critical and I feel that it's still slow. I was also thinking about the process startup that probably also includes loading of the code. I think it's really dependent on the kind of system and also, small code doesn't really make much speed gain than a larger code when it comes to loading since io functions mostly occur one and a time and a 512K code won't make much difference with a 1024K code. Quote:
It's not about additional memory usage but additional assignment statements. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
In any way, thanks again. |
Quote:
|
I have a few comments on how to make it faster I believe:
1) Dont optimize, and fix your code. It really is a mess, and optimizing by guessing usually has the reverse effect. Instead, make your algorithims better and stop worrying about squezing all the CPU out of it. 2) As a side note, using a switch statement usually produces less effiecent code for the CPU, using a switch statement in a loop while trying to make that loop faster is probably bad. 3) Stop making so many library calls. You seem to be making calls to do output left and right, when each call has a overhead. Instead, try to only make one output call when you know what to output, that should make things a bit faster. |
Quote:
Quote:
Quote:
|
A 50% time saving in part of the system that runs 2% of the time saves 1%; a 25% time saving in part of the system that runs 80% of the time saves 20%. Work smart, not hard! Identify the bottlenecks and optimise those; thg rest is of little consequence. Unless, of course, you are doing it for the pleasure in which case do what you like!
|
Quote:
Of course there's also a bit of pleasure or else it will all only be nothing but pain. Not all but don't most hackers do take pleasure with their programming? ^^, |
|
By 'library calls' I ment calls that went to the kernel, such as outputting text. String/array calls should be fine, and probably faster then a equivelent you can come up with.
As for your insistence upon speed, I gave your program a pass thru with 'time', and I see no real diffrence between your code and the command 'readlink -f'. Naturally, with such a small program doing such a simple job, itll probably take longer to start your program then to run it. I dont think optimizing this program is your best option for making things run faster, or that there is much you can do to optimize things as they are. |
Quote:
If your shell script feels slow, either switch to a better language (Perl/Python) or write your whole program in C/C++. And don't estimate modern compiler optimization strengths - it does really smart things, and it does them even when you use, say, array notation with indexes and pointer one - the former is safer overall. You are wasting your time optimizing this "C" code. |
Quote:
The load time will far exceed the execution time, so none of optimizations will have visible effect and if you wanted effective optimization you would focus entirely on reducing the size and the external linkage. But I think it is fair to ignore that. Significant parts of the code are only executed once, so their execution time is dominated by the cache misses of the first execution. So the speed of the instructions doesn't matter. The size matters and to a lessor extent the first pass branch mispredictions matter. Both can be improved by (among other things) doing some tail merges better by hand than the compiler can. Do you want to ignore the performance issues of "one time" code (any code that is executed so rarely that it is dominated by cache misses)? Lots of big programs have their total execution time dominated by such code (not actually one time code, but effectively one time code because it is always dropped from the cache before being executed again). A performance fanatic should learn how to optimize that code differently than inner loop code. For example Code:
memcpy(p, sourceArg, sourceArgSize); Quote:
ADD is bigger (but not inherently slower) than INC. Sometimes (such as with cache misses) bigger itself means slower. Sometimes it doesn't. The compiler rarely generates INC or DEC instructions, even if they would be better and even if you have distorted your code to bias toward them. Your several distortions toward INC each either make no difference or make the generated code worse. If you insist on controlling the asm code you have to write asm code. A=B+1; will always generate code at least as good as A=B;++A;. Usually they will be the same. Occasionally the second will be worse. If the compiler would be willing to use INC for the second, it will also use INC for the first. When you know X was zero (so you have this choice) ++X; in asm code will sometimes be faster than X=1; (less often than you think). But the C compiler will almost never let you achieve that optimization. In C code X=1; will usually be faster, including situations where ++X; would be faster in asm code. Quote:
Quote:
Quote:
Consider x=p-1; ... *x ...; If x is not used after that, the compiler will probably figure out you mean ... p[-1] ...; and generate exactly the same optimal code. If you use x later or involve a -- instead of the -1 you run a significant chance of confusing the compiler and generating worse code. Sometimes a DEC instruction would be faster than letting the compiler use the advanced addressing mode. I've tried to micro optimize that case. You can't win, short of writing the asm code yourself. Either you confuse the compiler enough that it doesn't use the advance addressing mode but does something else much worse than the DEC, or you fail to confuse it and it does the advanced addressing mode. But I don't think you would even recognize the case where the DEC would be better. It isn't the case you probably expect. Don't try to micro optimize unless you know more than the compiler writer knew. |
Wow there's nothing much I can say. Thank you very much for this great info. You've been frank but informative and that's what I really needed. It's kind of not easy for to be online this time but I'll save this page and think about all the things you said and decide what's best for my program for the next following days. I'll also consider the capabalities of modern compilers. I'll make a report once I'm finally done.
|
Quote:
Quote:
Clarity: no I don't actually know how compilers work. I've only the difference seen the outputs that varies depending on the optimization level used. :-). I'm no genius. :D |
Again guys if you find a good alternative to the library calls in the code just tell me. Asm is a good option but not yet. I want to know what's best to have it done.. in C. Thanks again all.
|
Quote:
I'm not sure whether you are suggesting redesign to avoid needing a three way branch or you are suggesting that two if statements would be faster for a three way branch (it usually isn't). I haven't looked carefully enough to be certain but contrast Code:
switch (*p) { Code:
switch (dotsCount) { In the second case the programmer may know things about the likely or possible values that would allow a double if statement to be faster than a switch, and the whole three way branch might not be as fundamental to the problem. So I'm not sure, but maybe something faster than the switch could be coded. A switch is often less efficient than if statements because the compiler is usually forced to generate a "default" path even if you don't make one explicit and even if the problem doesn't require it. But in these examples the problem does require a default path and it is explicit in the source code. |
Quote:
Quote:
|
Quote:
Quote:
By the way I made some comparisons with my original code and a version of the code where I made modifications that includes 'p[-1]' and 'token = p + 1'. The gcc compiler (verson 4.1.2) generates the same output from both codes when optimization flag is enabled. But when the outputs are not optimized, the modified version appeared to be faster. Both still have the same number of instructions and both also use increment statements (INCL and INC) but there was a difference in size and the targets. The compiler understood my original code as something that implies direct modification to the memory (like INCL) and not something that uses the registers first. With this form, it's probably slower right? It was probably quicker if the compiler did not recognize p + 1 as "INC AX". You guys are right. Maybe I really shouldn't over-underestimate compilers. P[-1] by the way also appears to yield faster code. ntubski and johnsfine was right all along. I'll change my code with those and have it finalized. Thanks for the help guys. I'll take a note of all the things you posted here. And by the way I still can't understand some of johnsfine's statements. I know the concept of caches but what are cache misses and what's a partial tail merge? :) Maybe I'll study those when I have time and after I finish the project. The first working prototype of its first and maybe final protocol (at last after many attempts to make it complete and perfect according to its requirement) is soon to be finished. ;) |
Quote:
Quote:
1) Flow of control. If the last action before a point where flow of control joins is identical on both sides, the compiler can reduce code size and maybe cache misses by joining earlier than the source code indicates. This is most common and most obvious in the join after if/then/else. Compare: Code:
if (a) { foo(x,y,z); } else { p=something; foo(p,y,z); } Code:
p=x; if (!a) { p=something; } foo(p,y,z); In the first case, the last operation before joining is calling foo() so the compiler could join it early, but the benefits are tiny and an optimizer would be unlikely to guess joining early was right. Since parameters are normally set up right to left, there is a more obvious early join in Code:
if (a) { foo(x,y,z); } else { p=something; foo(x,y,p); } Now consider Code:
p=x; if (a) {foo(p,y,z);} else { p=something; foo(p,y,z);} 2) The other meaning of "tail merge" is when the last operation of a function (before returning) is to call a function with a compatible parameter stack and the same return value (turns out to be very common in advanced C++ programming). The compiler can change the code to jump to the other function instead of calling it, and thus save the extra layer of return. |
Got it. I now know what you mean. It seems that there are parts of the code that only compilers can optimize.
By the way, regarding my second switch statement, do you think it's better if I just do something like: Code:
} else if (dotsCount && dotsCount != 1) { Code:
} else if (dotsCount) { |
I don't feel like studying your source code enough to decide the optimal way to branch on dotsCount. But I doubt that either of the above would be better than just using a switch.
What's wrong with Code:
} else if (dotsCount >= 2) { |
Thanks for figuring that out. I was always focusing on something like 'n |/&/^ m' so i forgot simple things like > or >=.
|
If we're still in the micro-optimization game, then I would suggest optimizing for the common case: directories which are not "." or "..". This way for most of the time the code doesn't have to look out for dots, just slash and end of string. EDIT: And by putting a '/' at the end of workBuffer a lot of checks for end of string can be removed. By the way, I uncovered a bug when I tried this out:
Code:
workBufferSize += cwdLength + 1; // +1 is needed for the '/' between cwd and sourceArg Code:
// ensure that workBuffer ends in "/", to reduce checks for '\0' |
Thanks for the fix. That was a bug indeed. I'll take a look at the new code as well. Later.
|
Quote:
Your performance bottleneck is not "C", it's your shell script. |
Quote:
Some folks say that I'm wasting so much time to this code but actually I only created the code in an hour or two and posted it. I also took time making replies to this thread and analyzing the code but it was fun anyway. I spent most of the time on my scripts and not on this program. Now regarding my shell script, I think it is already in its most optimized form but I know that I can never be sure with it so I also plan to post it later but the script is part of a project and I still can't find a proper license for it. There's nothing much in the project but I just want to be careful. Regardless, thanks for your reminders. |
Quote:
And why is the script a shell one in the first place ? |
Quote:
|
Quote:
|
it's sourced or can be called if it's the main caller.
|
Quote:
Quote:
Code:
./getabspath ../abc But anyway the suggestion is helpful and I'm still finding a way how to use it to improve the code although I still can't. However also, when I was looking at the original code, I found a solution in which the code will no longer need malloc()s. That's by immediately binding the tokens of the cwd to the tokens array. When I made the changes, I also included some of the tweaks that was recommended in this thread. Here's my current code: Code:
#include <stdio.h> |
Quote:
Quote:
If you want to avoid malloc'ing and checking the length of argv[1], the trick with the slash doesn't work, but here's the code with the other changes. Code:
#include <stdio.h> |
Quote:
Quote:
Quote:
Currently, my plan is to install many versions of gcc to my system especially newer ones like 4.4.1 and 4.5 if possible but I need optimize my system first before it. About the new code, I'll have a copy of it again and look at it later when I have extra time or when I'm studying the compiler. I haven't forgotten the scripts by the way. The internals of the script is already complete as expected and no longer require any revision but sadly the interface (callable functions and options) did not, so I thought maybe I need to rest for a while and try to finish it again later when I'm ready. Also I already have choices for the license or agreement that I'll use. Probably it's CC's attribution 3.0, CC0 1.0 or simply just public domain. I'll choose after I have a complete review of the three. |
Quote:
|
Quote:
Code:
workBufferSize += (cwdLength = strlen(cwd)); Quote:
Quote:
Quote:
Quote:
|
Quote:
Quote:
Quote:
|
Final prototype? Sorry I can't help but use gotos here :)
Code:
/* getabspath.c Thanks for the help everyone. Finally. Now this is what I meant :). |
Quote:
Quote:
Quote:
Quote:
|
Quote:
Quote:
Quote:
|
Quote:
Quote:
|
Quote:
Quote:
Quote:
But regardless of that, I don't think that there's something really wrong that's going to happen if I don't revoke what it was that I have decided. I still think that it's just better to finish what it was that I started with what I have decided. |
Quote:
Code:
$ cat builtins.c |
Quote:
|
Whoa :eek:
There's a lot of code there, I sincerely doubt that replacing the abspath functions will make a noticeable difference, but it's hard to say without some typical input files to test things with. If you are serious about optimizing I would go back to my earlier suggestion: rewrite the program to handle multiple paths per run, perhaps using a fifo or socket. I'm wondering why you have a hashing function in your compiler, aren't all awk arrays hash tables already? |
Quote:
Quote:
I have test scripts but I didn't include it in the package. It's not really necessary to include them I guess. Quote:
Quote:
By the way I finally finished the Simpler version as well. Loader-Simpler includes the extended functions loadx(), includex() and callx(). They acknowledge patterns and expressions. The last will be the full and non-stripped version but it's the most difficult and surely it will not be soon.. though. |
Quote:
|
Quote:
Quote:
|
Quote:
Quote:
|
Quote:
But of course and nevertheless, I can never be sure of that and suggestions like yours (esp. from someone that helped me a lot already) will always be welcomed. Just made the cleaner version of the test scripts and uploaded them. Same site at sourceforge in the extra folder. Good luck and enjoy :D. |
All times are GMT -5. The time now is 08:58 AM. |