LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Strange pointer decrement behaviour (https://www.linuxquestions.org/questions/programming-9/strange-pointer-decrement-behaviour-4175527341/)

little_wolf_e 12-04-2014 12:13 PM

Strange pointer decrement behaviour
 
I just traced a problem with using the decrement operator in a loop which is dependent upon the version of gcc with which you compile the code. WIth the SAFE_VERSION constant definition set to "1" the loop works as expected with gcc 4.3.0:

Code:

#include <stdio.h>
#include <string.h>

#define DELAY                        3
#define SAFE_VERSION        1

int main(void)
{
        char        *p, str[] = "abcdefghij";
        int                i;

        printf("String before = '%s'.\n", str);
        p = str + strlen(str) - 1;
#        if SAFE_VERSION
        for (i = strlen(str) - DELAY; i; i--, p--)
                *p = *(p - DELAY);
#        else
        for (i = strlen(str) - DELAY; i; i--)
                *p-- = *(p - DELAY);
#        endif
        printf("String after  = '%s'.\n", str);
        return 0;
}

Producing the output:

Code:

String before = 'abcdefghij'.
String after  = 'abcabcdefg'.

What I would have expected would be that the right-hand side expression would be evaluated, assigned to the value for the pointer on the left-hand side then the pointer position would be decremented.

This did not work with gcc version 4.8.3 which produces the following output:

Code:

String before = 'abcdefghij'.
String after  = 'abc'.

Is this something that I should have known or is it new weirdness in the compiler?

genss 12-04-2014 12:22 PM

p-- decrements the pointer, *p-- decrements the value it points to
afaik

NevemTeve 12-04-2014 12:45 PM

The problem is in your code. The evaluation order of assignments (= and ++) isn't defined by any standard; it is called 'Undefined behaviour'. Another (anti-)example:
Code:

int a= 3;
int b = a++ + ++a; // at this point b can be 6 or more, or less. Undefined behaviour


johnsfine 12-04-2014 01:01 PM

Quote:

Originally Posted by genss (Post 5279514)
p-- decrements the pointer, *p-- decrements the value it points to
afaik

Look at any C or C++ reference, such as
http://en.cppreference.com/w/c/langu...tor_precedence

Notice where is says "right to left" associativity for group 2 in that table.

I hope it is obvious what that means.

Quote:

Originally Posted by little_wolf_e (Post 5279510)
What I would have expected would be that the right-hand side expression would be evaluated, assigned to the value for the pointer on the left-hand side then the pointer position would be decremented.

That expectation is your mistake. As NevemTeve said, the sequence is not defined.

There are few and specific constructs in C and C++ for which sequence is defined. The ; ending a statement is the most common/important of those. But otherwise sequence is generally undefined, so when you make any "this happens then that happens" assumptions without a defined sequence point, your code is likely to work by accident rather than by design and then likely to bite you later. In other words what you saw as gcc version "new weirdness" is really your own old bug.

Quote:

Originally Posted by NevemTeve (Post 5279528)
Another (anti-)example:
Code:

int a= 3;
int b = a++ + ++a; // at this point b can be 6 or more, or less. Undefined behaviour


I think that is an example of a different and fundamentally more serious kind of undefined behavior. Another language rule is being violated there beyond the violation of assuming sequence without a sequence point. If it were merely a sequence violation than we we expect b to end up as 7, 8 or 9 and a to end up as 5. But real (not just theoretical "undefined means anything might explode") compilers may give not only results other than 7, 8 or 9 for b and even results other than 5 for a. Regardless of the sequence, a was incremented twice and if you don't know the real rules of the language, you would expect it to reach 5 (and b to be 7 to 9) depending on which order the steps occur in.

genss 12-04-2014 02:08 PM

Quote:

Originally Posted by johnsfine (Post 5279546)
Look at any C or C++ reference, such as
http://en.cppreference.com/w/c/langu...tor_precedence

Notice where is says "right to left" associativity for group 2 in that table.

I hope it is obvious what that means.

thx, never have done such a thing

anyway, this is interesting to me so il lay it out
an equation is variable = expression
but there is an operation on the variable and the variable is used to calculate the expression
so to deconstruct

variable
--------------
p = p--;
value pointed to by p
--------------

expression
--------------
p - DELAY;
value pointed to by result
--------------

according to the table
-- is done first
- second
= third

but... p-DELAY is in parentheses so math says it has to be done before any other operation on it's variables
so (p-DELAY) becomes a different value to what p-DELAY could give

just some thoughts..

rknichols 12-04-2014 02:25 PM

Quote:

Originally Posted by genss (Post 5279571)
according to the table
-- is done first
- second
= third

No, that is the precedence according to which the operators bind. It says absolutely nothing about the sequence in which the operations are performed beyond the obvious requirement that an operation cannot be performed before the values of its operands have been determined. Even for a simple
Code:

a = b++;
it is entirely possible that the first action taken is to increment variable b as long as the original value is remembered for use in the expression. (No, I don't know of a machine architecture for which that would be reasonable, but there is nothing in the language that would preclude such a sequence.)

Further down in that page is a link to an "Order of evaluation" page, with some notes on undefined behavior that are relevant.

johnsfine 12-04-2014 03:00 PM

Quote:

Originally Posted by rknichols (Post 5279579)
No, that is the precedence according to which the operators bind. It says absolutely nothing about the sequence in which the operations are performed beyond the obvious requirement that an operation cannot be performed

Since I was commenting (at that moment) on genss's incorrect post, rather than the original question, and I have been programming for so long, I did not even think of the potential for someone less experienced to interpret an operator precedence table other than as you just said.

But now that you said it so clearly for people who already knew it, we are stuck with the problem that people who didn't already know the above also don't know what "bind" means in the above context.

I'm don't know how I could both accurately and clearly describe what "bind" means. But the effect is very similar to wrapping () around the operator and its nearest possible operands.

so p = p - p--;
when we "bind" the -- first we have p = p - (p--);
and when we bind the - second we have p = (p - (p--));

In the original *p-- when we bind "right to left" we get *(p--) and then (*(p--))

That is not exactly what "bind" means and you can find cases in C where the extra () implied by a method that crude would mean something else. But to a first level of understanding that is the best definition of "bind" I have.

And as you said more clearly, once you've done all that binding, you still haven't determined much about the execution sequence.

We know from the binding that the -- applies to p and not to *p, but we don't know whether the memory read operation specified by the * happens before or after the subtract or memory write operations specified by the --

Even in *--p we can know that computing the decremented value happens before the memory read specified by the *, but we still don't know that storing that value back into the variable p happens before that read, and we don't know that the decremented value is computed only once. It might be before and after the *.

Code:

char* p;
p = (char*)&p + 1;
*--p = 'B';

The compiled code might well optimize the value of --p by knowing it is &p, but put in a later decrement operation for the side effect of --p. So we replace the first byte of p with 'B' and then decrement p. On an LSB machine the net operation replaces the low byte of p with 'A'. (But of course that is not the most likely behavior).

This is the kind of "undefined" operation where a theoretical standard compliant compiler could replace the whole line with code to reformat your hard drive, and even a real compiler may generate code well outside the range of possibilities expected by a naive programmer.

genss 12-04-2014 03:09 PM

y my bad, this increment+assign is new to me
i even put it into 2 groups before saying that

why even allow increment before equal sign if the variable is used after the equal sign ?
p++;
*p=*(p-DELAY);
would not be prone to interpretation
maybe a cc warning would be nice ?

little_wolf_e 12-04-2014 03:11 PM

Hi,

Thanks for all the replies. I am happy just to have traced the problem. Not having studied compilers extensively I must say all this talk of sequences has my head spinning.

Quote:

Originally Posted by johnsfine (Post 5279546)
Look at any C or C++ reference, such as

That expectation is your mistake. As NevemTeve said, the sequence is not defined.

In the "Order of evaulation" reference given by rknicols http://http://en.cppreference.com/w/...age/eval_order rule number 6 states:

Quote:

6) The value computation of the postincrement and postdecrement operators is sequenced before its side-effect. (since C11)
By my limited understanding, does this not correspond with my expectation in my original post?

Once again I appreciate your discussing this with me.

johnsfine 12-04-2014 03:26 PM

Quote:

The value computation of the postincrement and postdecrement operators is sequenced before its side-effect. (since C11)
I'm having trouble wrapping my head around any necessity of even saying the above. I guess the standards authors saw some ambiguity that I don't see.

I think you may be missing the narrowness of the meanings of "value" and "side-effect" in that quote, in thinking that says something significant about sequencing, rather than simply being the meaning that postincrement or postdecrement always had.

Quote:

Originally Posted by little_wolf_e (Post 5279605)
By my limited understanding, does this not correspond with my expectation in my original post?

No, it does not.

Quote:

Originally Posted by little_wolf_e (Post 5279510)
What I would have expected would be that the right-hand side expression would be evaluated, assigned to the value for the pointer on the left-hand side then the pointer position would be decremented.

Code:

*p-- = *(p - DELAY);
That rule kind-of says the value of the p (in the --p) on the left of your statement is computed before it is changed by the --.

That rule says nothing about whether another instance of p on the right hand side is computed before or after that --.

SoftSprocket 12-04-2014 03:30 PM

Quote:

Originally Posted by little_wolf_e (Post 5279605)
Hi,

Thanks for all the replies. I am happy just to have traced the problem. Not having studied compilers extensively I must say all this talk of sequences has my head spinning.



In the "Order of evaulation" reference given by rknicols http://http://en.cppreference.com/w/...age/eval_order rule number 6 states:



By my limited understanding, does this not correspond with my expectation in my original post?

Once again I appreciate your discussing this with me.

Sequence points aren't a compiler issue - it's a C language issue.

Code:

*p-- = *(p - DELAY);
In to my reading of this the only sequence point is the ';' and the DELAY will be subtracted from from the address of p, the value there will be dereferenced and assigned as the value of p and the pointer will then be decremented. Since p itself is not being being altered in the assignment statement this is well defined - up until the point you begin assigning a value to the locations that precede p. At that point all bets are off - and this is what you're doing.

No points for style either.

genss 12-04-2014 03:45 PM

Quote:

Originally Posted by genss (Post 5279603)
maybe a cc warning would be nice ?

should i file a gcc bugreport for a warning ?
"request for "undefined evaluation order" warning" ?

anyway,
thx to johnsfine and rknichols

SoftSprocket 12-04-2014 03:49 PM

Quote:

Originally Posted by SoftSprocket (Post 5279614)
Sequence points aren't a compiler issue - it's a C language issue.

Code:

*p-- = *(p - DELAY);
In to my reading of this the only sequence point is the ';' and the DELAY will be subtracted from from the address of p, the value there will be dereferenced and assigned as the value of p and the pointer will then be decremented. Since p itself is not being being altered in the assignment statement this is well defined - up until the point you begin assigning a value to the locations that precede p. At that point all bets are off - and this is what you're doing.

No points for style either.

Actually on more careful inspection the math is right.

The sequence point rule is: between consecutive "sequence points" an object's value can be modified only once by an expression.

It's the unsequenced access here that is the issue. The value of p is potentially changed by p-- then accessed before hitting the sequence point (the end of the expression).

Apologies for muddying the waters. *

*This is why I never write code remotely like that. To easy to get something wrong.

---------- Post added 12-04-14 at 01:50 PM ----------

Quote:

Originally Posted by genss (Post 5279629)
should i file a gcc bugreport for a warning ?
"request for "undefined evaluation order" warning" ?

anyway,
thx to johnsfine and rknichols

Did you use -Wall as a switch? You will get a warning.

turtleli 12-04-2014 03:51 PM

Quote:

Originally Posted by genss (Post 5279629)
should i file a gcc bugreport for a warning ?
"request for "undefined evaluation order" warning" ?

Both gcc (-Wall) and clang (-Weverything) would have warned that the operations are unsequenced.

EDIT: Got beat to it.

little_wolf_e 12-04-2014 03:59 PM

Quote:

Originally Posted by SoftSprocket (Post 5279632)
Actually on more careful inspection the math is right.


Did you use -Wall as a switch? You will get a warning.

Thanks for the timely reminder to use the -Wall flag, and now I will know what the warning means.


All times are GMT -5. The time now is 10:41 AM.