LinuxQuestions.org - [SOLVED] need help to further *optimize* my small C program that produces absolute path

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - need help to further *optimize* my small C program that produces absolute path (https://www.linuxquestions.org/questions/programming-9/need-help-to-further-%2Aoptimize%2A-my-small-c-program-that-produces-absolute-path-744726/)

need help to further *optimize* my small C program that produces absolute path

I need to further optimize this C program of mine that produces the absolute path of any path that is passed as an argument to it. All kinds of path are accepted (including but not only both absolute but indirect and relative paths). If the entered path ends with a slash ('/'), the output will also end in a slash or else it ends with the name of the last directory or "/.".

Here's the code.

Code:

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <unistd.h>



#define CWDBUFFERSIZE  512

#define MAXTOKENS      64

#define CHECKMAXTOKENS



void reportMaxTokensReached (void);



int main (int argc, char const ** argv) {

        register char * p;



        char * workBuffer;

        unsigned int workBufferSize;



        char * token;

        char * tokens [MAXTOKENS];

        unsigned int t;



        char cwd [CWDBUFFERSIZE];

        unsigned int cwdLength;



        char const * sourceArg;

        unsigned int sourceArgSize;



        unsigned int dotsCount;

        unsigned int hasNonDots;

        unsigned int isRelative;



        unsigned int i;



        if

        (

                argc == 1

        ||

                // We place the assignments here because we won't need the parameters assigned

                // if 'argc == 1'.

                //

                // I hope also that there's a way to make sourceArgSize be immediately set to

                // +1 of strlen(...) without changing the value that's comparable to 0 but I

                // can't find a way.  Perhaps C is just C and that's it.

                //

                (sourceArgSize = (strlen(sourceArg = argv[1]))) == 0

        )

        {

                if (getcwd(cwd, CWDBUFFERSIZE)) {

                        puts((char const *) cwd);

                        return 0;

                } else {

                        return 1;

                }

        }



        workBufferSize = ++sourceArgSize;



        if ((isRelative = (*sourceArg != '/'))) {

                if (getcwd(cwd, CWDBUFFERSIZE)) {

                        workBufferSize += (cwdLength = strlen(cwd));

                } else {

                        return 1;

                }

        }



        if ((workBuffer = (char *) malloc(workBufferSize)) == 0x00) {

                fprintf(stderr, "failed to allocate memory.\n");

                return 1;

        }



        if (isRelative) {

                memcpy(workBuffer, cwd, cwdLength);



                p = workBuffer + cwdLength;



                *(p++) = '/';



                memcpy(p, sourceArg, sourceArgSize);

        } else {

                memcpy(workBuffer, sourceArg, sourceArgSize);

        }



        dotsCount = 0;

        hasNonDots = 0;

        t = 0;



        for (p = token = workBuffer; *p; ++p) {

                switch (*p) {

                case '/':

                        if (hasNonDots) {

                                #ifdef CHECKMAXTOKENS

                                if (t == MAXTOKENS) {

                                        reportMaxTokensReached();

                                        return 1;

                                }

                                #endif



                                tokens[t++] = token;

                        } else if (dotsCount) {

                                switch (dotsCount) {

                                case 1:

                                        break;

                                case 2:

                                        if (t)

                                                --t;

                                        break;

                                default:

                                        #ifdef CHECKMAXTOKENS

                                        if (t == MAXTOKENS) {

                                                reportMaxTokensReached();

                                                return 1;

                                        }

                                        #endif



                                        tokens[t++] = token;

                                }

                        }



                        dotsCount  = 0;

                        hasNonDots = 0;

                        *p = '\0';



                        token = p; ++token;        // token = p + 1;

                                            // OR 'token = ++p;' but will increase size of loop block, faster processing?

                        continue;

                case '.':

                        ++dotsCount;

                        continue;

                default:

                        ++hasNonDots; // faster than 'hasNonDots = 1;' ?

                }

        }



        if (hasNonDots) {

                #ifdef CHECKMAXTOKENS

                if (t == MAXTOKENS) {

                        reportMaxTokensReached();

                        return 1;

                }

                #endif



                tokens[t++] = token;

        } else if (dotsCount) {

                switch (dotsCount) {

                case 1:

                        break;

                case 2:

                        if (t)

                                --t;

                        break;

                default:

                        #ifdef CHECKMAXTOKENS

                        if (t == MAXTOKENS) {

                                reportMaxTokensReached();

                                return 1;

                        }

                        #endif



                        tokens[t++] = token;

                }

        }



        if (*(--p)) {

                // last character is not a forward slash ('/')



                if (t) {

                        for (i = 0; i < (const unsigned int) t; i++) {

                                putchar('/');

                                fputs(tokens[i], stdout);

                        }



                        puts("");

                } else {

                        puts("/.");

                }

        } else {

                if (t) {

                        for (i = 0; i < (const unsigned int) t; i++) {

                                putchar('/');

                                fputs(tokens[i], stdout);

                        }

                }



                puts("/");

        }



        return 0;

}



void reportMaxTokensReached (void) {

        printf("path has too many tokens.\n");

}

I need to optimize this without being dependent to the compiler and also not being dependent to asm codes as although I want the code to be currently optimized for x86+ systems, I still want it to be easily portable to other systems if required later.

Thanks in advance for any help.

Why do you need to optimize it? If you are only converting 1 path it seems to me that this will be fast enough (as in, less than an eye blink). If you are converting many paths probably more time is being wasted in process startup than actual runtime of your code, I would suggest changing your program to convert multiple paths per run.

Quote:

Code:

  if

        (

        argc == 1

        ||

        // We place the assignments here because we won't need the parameters assigned

        // if 'argc == 1'.

        //

        // I hope also that there's a way to make sourceArgSize be immediately set to

        // +1 of strlen(...) without changing the value that's comparable to 0 but I

        // can't find a way.  Perhaps C is just C and that's it.

        //

        (sourceArgSize = (strlen(sourceArg = argv[1]))) == 0

        )

Putting complicated expressions into an if is bad style. Are you afraid of using too many variables? An optimizing compiler won't waste any more memory on them.

Quote:

token = p; ++token;// token = p + 1;
// OR 'token = ++p;' but will increase size of loop block, faster processing?

I would expect the first 2 to produce the same code, but I prefer the 2nd since it is simpler. I'm not sure what your comment about the 3rd means, but it looks to me like it would cause the skipping of a character and therefore is simply incorrect.

Quote:

++hasNonDots; // faster than 'hasNonDots = 1;' ?

I would expect the "= 1" to be faster since it doesn't have to read the variable first. Also the adding to the variable every time could theoretically lead to overflow (okay, various system limitations would prevent the string from ever being that long, but I think "= 1" is more correct).

Quote:

Code:

            if (hasNonDots) {

#ifdef CHECKMAXTOKENS

                if (t == MAXTOKENS) {

                    reportMaxTokensReached();

                    return 1;

                }

#endif



                tokens[t++] = token;

            } else if (dotsCount) {

                switch (dotsCount) {

                case 1:

                    break;

                case 2:

                    if (t)

                        --t;

                    break;

                default:

#ifdef CHECKMAXTOKENS

                    if (t == MAXTOKENS) {

                        reportMaxTokensReached();

                        return 1;

                    }

#endif



                    tokens[t++] = token;

                }

            }

This is somewhat long and confusing, how about

Code:

          if (!hasNonDots && dotsCount == 2 && t > 0) {

              --t;            /* ".." means go up one directory */

          } else if (hasNonDots || dotsCount >= 3) {

              CHECK_TOKENS(t);

              tokens[t++] = token;

          }

And since this is duplicated after the loop, you should put it into a function (with inlining that shouldn't slow anything down).
CHECK_TOKENS is #defined as:

Code:

#ifdef CHECKMAXTOKENS

static void reportMaxTokensReached (unsigned int num_tokens) {

    if (num_tokens >= MAXTOKENS) {

        fprintf(stderr, "path has too many tokens.\n");

        exit(1);

    }

}

#  define CHECK_TOKENS(t) reportMaxTokensReached(t)

#else

#  define CHECK_TOKENS(t)

#endif

Quote:

if (*(--p)) {

Why have side effects when you don't use them? I would prefer *(p-1) or p[-1].

Quote:

i < (const unsigned int) t;

I don't see what that cast is good for, t is already an unsigned int, casting an LR-value to const doesn't change anything...

The good question would be: "Why to optimize at all in this case ?". Since code deals with file paths, most likely it will be called between file operations, which are much slower than the code anyway.

Quote:

Originally Posted by ntubski (Post 3630259)

First, thanks for the big reply.

Regarding the conversion of multiple paths at once, it won't be applicable for my purpose since the program will be mostly used in a script that will require only one conversion per call.

Now regarding about the speed, it's probably already fast enough but I just want it be in its fullest. Not really a reason perhaps. Just my taste. After all the shell script I'm creating is time critical and I feel that it's still slow.

I was also thinking about the process startup that probably also includes loading of the code. I think it's really dependent on the kind of system and also, small code doesn't really make much speed gain than a larger code when it comes to loading since io functions mostly occur one and a time and a 512K code won't make much difference with a 1024K code.

Quote:

Putting complicated expressions into an if is bad style. Are you afraid of using too many variables? An optimizing compiler won't waste any more memory on them.

I see no other way to put the code and no, I'm not afraid to sacrifice memory more likely if it's for the sake of speed. After all, in most compilers local variables are allocated by just decrementing the stack pointer to reserve a space in the stack.

It's not about additional memory usage but additional assignment statements.

Quote:

I would expect the first 2 to produce the same code, but I prefer the 2nd since it is simpler.

About why I'd prefer ++ instead of + 1, the thing is just in the expected compilation output. ADD is slower and bigger than INC right?

Quote:

I'm not sure what your comment about the 3rd means, but it looks to me like it would cause the skipping of a character and therefore is simply incorrect.

Of course with that you'll expect to remove the ++p instruction in the for loop header and include it in all the inclusion blocks.

Quote:

Again it's about the use of ADD or INC. Also, as I've said, I don't want to be dependent of the compiler. Not all compilers might recognize + 1 as just an increment and use INC and not ADD in the output form.

Quote:

This is somewhat long and confusing, how about

Code:

          if (!hasNonDots && dotsCount == 2 && t > 0) {

              --t;            /* ".." means go up one directory */

          } else if (hasNonDots || dotsCount >= 3) {

              CHECK_TOKENS(t);

              tokens[t++] = token;

          }

And since this is duplicated after the loop, you should put it into a function (with inlining that shouldn't slow anything down).

Speed and logic first before readability or design but with proper balance with design only if possible. That's my approach to this program. I want to optimize this program in its best as a whole and I think separating optimizations on another "inline" might just make it slower.

Quote:

CHECK_TOKENS is #defined as:

Code:

#ifdef CHECKMAXTOKENS

static void reportMaxTokensReached (unsigned int num_tokens) {

    if (num_tokens >= MAXTOKENS) {

        fprintf(stderr, "path has too many tokens.\n");

        exit(1);

    }

}

#  define CHECK_TOKENS(t) reportMaxTokensReached(t)

#else

#  define CHECK_TOKENS(t)

#endif

No I don't want to do that. Non-terminating function calls in a loop will just make things slower.

Quote:

Why have side effects when you don't use them? I would prefer *(p-1) or p[-1].

Just like ADD and INC, it's only about the difference of SUB and DEC. Also with p[-1], the pointer would probably first be transferred to another variable before deducting it. Probably not true all the time if the system or the compiler can use offsets but the one I use sounds faster for most situations.

Quote:

I don't see what that cast is good for, t is already an unsigned int, casting an L-value to const doesn't change anything...

Just tell the compiler not to check the variable again and again on the loop. More likely tell the compiler to put store the value first to a faster location. Not all compilers are dumb or recognizes this but it might just be helpful.

In any way, thanks again.

Quote:

Originally Posted by Sergei Steshenko (Post 3630304)

I'll be using it in a shell script program that manipulates paths one at a time. The script is still slow and probably there's no other way to make it faster because of dependencies and limitations of shells but just in case that a little additional speed will help...

I have a few comments on how to make it faster I believe:

1) Dont optimize, and fix your code. It really is a mess, and optimizing by guessing usually has the reverse effect. Instead, make your algorithims better and stop worrying about squezing all the CPU out of it.

2) As a side note, using a switch statement usually produces less effiecent code for the CPU, using a switch statement in a loop while trying to make that loop faster is probably bad.

3) Stop making so many library calls. You seem to be making calls to do output left and right, when each call has a overhead. Instead, try to only make one output call when you know what to output, that should make things a bit faster.

Quote:

Originally Posted by SciYro (Post 3630479)

I make clean and readable code when I program non-speed-critical programs like C++ OOPs but not this time. I already decided to chose speed before readability of code logic and to me this is not really mind-squeezing compared to other programs I studied or made so I guess I'll just have to finish this. Besides, I always can create another replica of this in a easier and cleaner form if I like, can't I.

Quote:

2) As a side note, using a switch statement usually produces less effiecent code for the CPU, using a switch statement in a loop while trying to make that loop faster is probably bad.

I haven't yet focused in analyzing the program using gdb and I'll be doing that later. I'll see what really happens then. Nice suggestion btw.

Quote:

3) Stop making so many library calls. You seem to be making calls to do output left and right, when each call has a overhead. Instead, try to only make one output call when you know what to output, that should make things a bit faster.

Probably the real reason why I posted this code. I don't want to make multiple library calls but I still can't tell how I'll be able to manipulate the tokens to make them combine into one whole string and make only one library-system call to print it. I could use similar other routines like memcpy but isn't that a library call as well? This time I don't know what's best to do. It seems that asm is the only thing that can really make it faster. Or are there some things that I don't know. Should I make my own routines? Wouldn't that be dependent to a limited number of systems?

A 50% time saving in part of the system that runs 2% of the time saves 1%; a 25% time saving in part of the system that runs 80% of the time saves 20%. Work smart, not hard! Identify the bottlenecks and optimise those; thg rest is of little consequence. Unless, of course, you are doing it for the pleasure in which case do what you like!

Quote:

Originally Posted by catkin (Post 3630502)

Your suggestion is a little bit confusing but I like it. I think I know what you mean but I just follow the requirement of my program. Not really have to be more than that. Just the requirement :).

Of course there's also a bit of pleasure or else it will all only be nothing but pain. Not all but don't most hackers do take pleasure with their programming? ^^,

The prudent programmer, before optimizing anything, is advised to browse here , which is a very short read, and here, which is a very fun one.

By 'library calls' I ment calls that went to the kernel, such as outputting text. String/array calls should be fine, and probably faster then a equivelent you can come up with.

As for your insistence upon speed, I gave your program a pass thru with 'time', and I see no real diffrence between your code and the command 'readlink -f'. Naturally, with such a small program doing such a simple job, itll probably take longer to start your program then to run it. I dont think optimizing this program is your best option for making things run faster, or that there is much you can do to optimize things as they are.

Quote:

Originally Posted by konsolebox (Post 3630468)

...
After all the shell script I'm creating is time critical and I feel that it's still slow.
...

The rest is irrelevant. Shells ('bash', 'csh', 'tcsh') interpret your script line by line, each time reparsing the line being interpreted, so this is what takes most of the resources.

If your shell script feels slow, either switch to a better language (Perl/Python) or write your whole program in C/C++.

And don't estimate modern compiler optimization strengths - it does really smart things, and it does them even when you use, say, array notation with indexes and pointer one - the former is safer overall.

You are wasting your time optimizing this "C" code.

Quote:

Originally Posted by konsolebox (Post 3630468)

Now regarding about the speed, it's probably already fast enough but I just want it be in its fullest.

I think it is a useful exercise for learning purposes to optimize code that doesn't need it. But you still should have some better perspective about which aspects you're intentionally ignoring vs. which you're misunderstanding.

The load time will far exceed the execution time, so none of optimizations will have visible effect and if you wanted effective optimization you would focus entirely on reducing the size and the external linkage. But I think it is fair to ignore that.

Significant parts of the code are only executed once, so their execution time is dominated by the cache misses of the first execution. So the speed of the instructions doesn't matter. The size matters and to a lessor extent the first pass branch mispredictions matter. Both can be improved by (among other things) doing some tail merges better by hand than the compiler can. Do you want to ignore the performance issues of "one time" code (any code that is executed so rarely that it is dominated by cache misses)? Lots of big programs have their total execution time dominated by such code (not actually one time code, but effectively one time code because it is always dropped from the cache before being executed again). A performance fanatic should learn how to optimize that code differently than inner loop code. For example

Code:

                memcpy(p, sourceArg, sourceArgSize);

        } else {

                memcpy(workBuffer, sourceArg, sourceArgSize);

        }

The compiler can do only a very partial tail merge there. You could do a complete tail merge (probably at the cost of one extra mov instruction in the final executed code). In inner loop code the extra mov might balance the savings of one well predicted branch and your manual tail merge nets to no benefit. In code dominated by cache misses the manual tail merge is a big win and the extra mov is trivial. You should understand the difference between the code that is fetched and the code that is executed and which matters when.

Quote:

About why I'd prefer ++ instead of + 1, the thing is just in the expected compilation output. ADD is slower and bigger than INC right?

Do you know x86 assembler? Have you looked at the generated code in x86 assembler.

ADD is bigger (but not inherently slower) than INC. Sometimes (such as with cache misses) bigger itself means slower. Sometimes it doesn't. The compiler rarely generates INC or DEC instructions, even if they would be better and even if you have distorted your code to bias toward them. Your several distortions toward INC each either make no difference or make the generated code worse. If you insist on controlling the asm code you have to write asm code.
A=B+1; will always generate code at least as good as A=B;++A;. Usually they will be the same. Occasionally the second will be worse. If the compiler would be willing to use INC for the second, it will also use INC for the first.
When you know X was zero (so you have this choice) ++X; in asm code will sometimes be faster than X=1; (less often than you think). But the C compiler will almost never let you achieve that optimization. In C code X=1; will usually be faster, including situations where ++X; would be faster in asm code.

Quote:

You can't micro optimize C code without a particular compiler and architecture in mind. If you want code to be fast in some as yet unknown compiler and architecture you're more likely to get it right by just coding what you mean: X=1; if that is what you mean, A=B+1; if that is what you mean.

Quote:

I want to optimize this program in its best as a whole and I think separating optimizations on another "inline" might just make it slower.

Compilers do a good job of inlining in a way that allows optimization across the boundary of the calling and called code. If defining and using an inline function makes your code more readable, do it. It won't make the code slower.

Quote:

Also with p[-1], the pointer would probably first be transferred to another variable before deducting it.

I didn't search for that spot in your code, but you are betraying a serious lack of understanding of x86 asm code. The addressing modes are very powerful and p[-1] would typically be done without any extra instruction for even the -1, much less transferring p to another register.

Consider x=p-1; ... *x ...; If x is not used after that, the compiler will probably figure out you mean ... p[-1] ...; and generate exactly the same optimal code. If you use x later or involve a -- instead of the -1 you run a significant chance of confusing the compiler and generating worse code.

Sometimes a DEC instruction would be faster than letting the compiler use the advanced addressing mode. I've tried to micro optimize that case. You can't win, short of writing the asm code yourself. Either you confuse the compiler enough that it doesn't use the advance addressing mode but does something else much worse than the DEC, or you fail to confuse it and it does the advanced addressing mode. But I don't think you would even recognize the case where the DEC would be better. It isn't the case you probably expect. Don't try to micro optimize unless you know more than the compiler writer knew.

Wow there's nothing much I can say. Thank you very much for this great info. You've been frank but informative and that's what I really needed. It's kind of not easy for to be online this time but I'll save this page and think about all the things you said and decide what's best for my program for the next following days. I'll also consider the capabalities of modern compilers. I'll make a report once I'm finally done.

Quote:

Originally Posted by Sergei Steshenko (Post 3630576)

No I can't switch because the script is really meant for shell scripting and not on any other language and the only thing left to make it quicker is to create a program like this. I know that it won't really make much difference but I'm just trying.

Quote:

And don't estimate modern compiler optimization strengths - it does really smart things, and it does them even when you use, say, array notation with indexes and pointer one - the former is safer overall.

You are wasting your time optimizing this "C" code.

I know that and I know how compilers optimize code as I've already seen the output. Yes I know it might only make the code less understood by the compiler and make it harder to optimize but I'm still learning anyway and I haven't known what is the true thing that happens. I won't feel comfortable unless I really knew it myself.

Clarity: no I don't actually know how compilers work. I've only the difference seen the outputs that varies depending on the optimization level used. :-). I'm no genius. :D