Order of evaluation for your C compiler?

hydraMax · 12-29-2011, 05:28 AM

Reading through the c99 standard, I happened to notice in the definitions:

Quote:

unspecified behavior
use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

So, I was curious and decided to give it a try on gcc-4.5.3:

Code:

$ cat eval-order.c 
#include <stdio.h>

void testfunc(int fst, int snd) {

}

int main() {
  int a = 2;
  testfunc(a = a*3, a = a-2);
  printf("%d\n", a);
}
$ gcc eval-order.c -o eval-order
$ ./eval-order 
0

And then reversed them, to check for consistency:

Code:

$ cat eval-order.c 
#include <stdio.h>

void testfunc(int fst, int snd) {

}

int main() {
  int a = 2;
  testfunc(a = a-2, a = a*3);
  printf("%d\n", a);
}
$ gcc eval-order.c -o eval-order
$ ./eval-order 
4

So, last is evaluated first, in this instance at least.

Those of you who use a different brand of compiler, would you be willing to give the same test a try and report back the results?

Doc CPU · 12-29-2011, 05:46 AM

Hi there,

Quote:

unspecified behavior
use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

another example of unspecified behavior is having the same variable two or more times in one expression while it is being modified, like this:

Code:

int a = 4;
int b = a++ + 3*a;

What's the value of b after this instruction? Could be 16, if the ++ operation is done after evaluation of the complete expression; could be 19 if the ++ operation is done immediately after reading the value of a. Again - not specified, both are allowed.

Quote:

Originally Posted by hydraMax

So, last is evaluated first, in this instance at least.

I would've expected that, because function arguments are placed on the stack from right to left. It makes sense to evaluate them in this order, but a compiler can also use a different approach.

Quote:

Originally Posted by hydraMax

Those of you who use a different brand of compiler, would you be willing to give the same test a try and report back the results?

Good old Borland C++ 5.0 from the mid-nineties does it the same way. But still, you can't rely on that. Better avoid these ambiguities.

[X] Doc CPU

hydraMax · 12-29-2011, 06:52 AM

Wow... this is a surprising one:

Quote:

A byte is composed of a contiguous sequence of bits, the number of which is implementation-
defined.

1 byte = ? bits

Well, wadda ya know... the number of bits in a byte is actually defined in limit.h:

Code:

/* Number of bits in a `char'.  */
#  define CHAR_BIT      8

Was there a 9 bit architecture at one time, or what...?

NevemTeve · 12-29-2011, 06:57 AM

No, a byte always meant eight bits, but there are non-byte-based platforms, with cell-size 9-12-16-32-60 etc bits-- some people suggest this cells should be called 'platform-dependent bytes'

Doc CPU · 12-29-2011, 07:06 AM

Hi there,

Quote:

Originally Posted by NevemTeve

No, a byte always meant eight bits, but there are non-byte-based platforms, with cell-size 9-12-16-32-60 etc bits-- some people suggest this cells should be called 'platform-dependent bytes'

correct, and it's very common that this platform specific size is called "word", while a byte has always 8 bits.

There is the old DEC PDP-11 that used 14 bit words, and some PIC microcontrollers also have a 14 bit word size. Other than that, most platforms use 16bit or 32bit words.
The Windows API, on the other hand, defines WORD as a 16 bit unsigned quantity, even though the architecture is 32bit (or even 64bit) wide.

[X] Doc CPU

ArthurSittler · 12-31-2011, 09:19 PM

Actually bytes are NOT necessarily 8 bits and a memory address is NOT necessarily the same as a byte pointer. That is why the standard states that byte size is implementation-dependent.

While I was playing schoolboy I spent some of my time programming on a DEC KL10 running DEC System20. That machine had 36-bit words. In general any address including instruction addresses were 18 bits. The address referred to a 36-bit word. Very few, if any, KL10s had more memory than the original memory addressing limit of 256k words, (2^18 words). 2^18 36-bit words was a lot of ferrite magnetic memory cores.

36-bit integers have a larger range than 32-bit integers, of course. The 36-bit floating point format also differed from what you may think of as float.

Using the KL10 was quite educational in that a byte pointer required an address of the word and the offset to the first bit in the byte. The definition of byte was variable, but a byte was usually 7 bits. A 7-bit byte can represent any ASCII character because ASCII is a 7-bit code. With 7-bit bytes, 5 bytes were packed into each word, with one bit left over. Teletypes originally used 5-bit codes, while some later models used 6-bit codes. One could pack seven five-bit bytes, six six-bit bytes, or 9 BCD or hexadecimal digits (also referred to as 4-bit bytes) into a word. A pointer to byte used 18 bits of one half of the word for memory address and nine bits to point to the first bit of the byte. I do not recall whether the byte pointer used the other nine bits. I think it did, either to point to the next free bit or to specify the length of the byte.

Pointers were all compatible in the sense that any pointer is no larger than a pointer to int. That may be why pointer to int was the default anonymous pointer, and int was the default return type. Any stack pointer or saved stack pointer value could be a 36-bit number. The same applies to frame pointers, if they were used.

That is why the various constants about byte size, maximum and minimum ints, and so forth are available in limit.h. They have probably stabilized to the point that you will not need to worry too much about them. However, they are still defined to be implementation-dependent values and in some cases they may not be the same value as you get with your compiler implementation values inquiry as above. Just because your local value from limit.h tells you that a byte is 8 bits does NOT mean that a byte will always be 8 bits. It means that you may inquire about its value and that the local value is available in limit.h. It is actually a good idea to test that supposition in your code at compile time or perhaps even at run time if it really matters. You are probably using a computer that actually organizes memory as 32-bit or 64-bit words. It is possible that the use of 8-bit bytes will be supplanted by 16-bit bytes soon because memory is becoming much cheaper and the 8-bit byte is not adequate to represent a usably complete character set in many languages.

I apologize for my penchant for long-winded replies.

NevemTeve · 01-01-2012, 01:16 AM

Let me repeat: I see no reason to call a non-eight-bit memory cell 'byte'. (Just like you wouldn't say that in UK use shorter meters to measure lengths -- they use feet and yards.)

hydraMax · 01-01-2012, 04:28 AM

Quote:

Originally Posted by ArthurSittler

I apologize for my penchant for long-winded replies.

No, I thought it was interesting and informative. The standard quotes clearly indicates that the number of bits in a byte is implementation-dependent. Even dictionary.com is careful not to portray a byte as necessarily consisting of eight bits:

Quote:

byte [bahyt] noun
1. adjacent bits, usually eight, processed by a computer as a unit.
2. the combination of bits used to represent a particular letter, number, or special character.

Though, I don't understand why we would ever need to switch to 16-bits bytes to accommodate an encoding: I was under the impression that was what wide characters were invented for. (32 bits per character, I believe.)

hydraMax · 01-01-2012, 11:52 PM

If I'm not boring people too much, I thought this was an interesting little quirk of C: that block level scope doesn't actually begin until the declaration. This is a little odd, but not so much:

Code:

$ cat scope1.c 
#include <stdio.h>

int main () {
  int a = 2;
  {
    printf("%d\n", a);
    int a = 3;
    printf("%d\n", a);
  }
  printf("%d\n", a);
}
$ gcc -O2 -std=c99 scope1.c -o scope1
$ ./scope1 
2
3
2

More strange, though, is a construction like this:

Code:

$ cat scope2.c 
#include <stdio.h>

int main () {
  int a = 2;
  for(int i=0; i<2; i++) {
    printf("%d\n", a);
    int a = 3;
    printf("%d\n", a);
  }
  printf("%d\n", a);
}
$ gcc -O2 -std=c99 scope2.c -o scope2
$ ./scope2 
2
3
2
3
2

While beginning the scope at a declaration is perhaps more intuitive to some people, in my mind it would seem more natural to have the scope extend to the entire block containing the declaration (i.e., it doesn't matter where you declare your variables, as long as you declare them within the block).

Just interesting, I thought.

ArthurSittler · 01-02-2012, 03:38 AM

Regardless whether we see any reason to call any other size than 8 bits a byte, other people in the past have seen reasons to call other sizes of storage bytes. For example, we waste 12.5% of transfer bandwidth and storage space by handling ASCII characters as 8-bit bytes. There are some applications which still do call other sizes of storage bytes. In depends on the application. I have seen bytes defined to be as small as 4 bits and as large as 12 bits. 8 bits has become very customary as a size for bytes due to the explosion of the computer industry initiated by 8-bit microprocessors, particularly the 8080.

My comment was an answer to the question about there being other byte sizes. This is a side issue and not the main gist of this thread.

Regarding the original question about evaluation order of function arguments, I see issues in the example code. I am pretty sure that the postincrement notation a++ in

Code:

int a = 4;
int b = a++ + 3*a;

explicitly specifies that the increment of a is done after evaluation of the entire statement. If so it is not a good test of the order of evaluation. There are reasons why a compiler may evaluate expressions which are function arguments left to right for simplicity. There are reasons involving code optimizations why the compiler might evaluate them right to left.

My other comment is that writing function parameters with interacting side effects makes it harder to read than explicitly performing the operations separately in multiple statements, using additional variables as necessary. When one writes computer code, one is not writing code for the compiler. One is writing for the next human who will be reading the code (which may well be the original author). If we were only writing for the computer, we could write machine code in hexadecimal. Simplicity and clarity for the human reader is the goal, not cleverness or some notion that you are optimizing the way the compiled code will execute. The goal of clarity for the reader should preclude writing code that creates any avoidable question, such as the order of evaluation of function arguments. Stating that the order of evaluation of arguments is undefined emphasizes that point.

johnsfine · 01-02-2012, 07:13 AM

Quote:

Originally Posted by ArthurSittler

I am pretty sure that the postincrement notation a++ in

Code:

int a = 4;
int b = a++ + 3*a;

explicitly specifies that the increment of a is done after evaluation of the entire statement.

I'm pretty sure the C standard does not specify that. The compiler is free to compute 3*a before or after incrementing a.

I don't have a copy of the C standard and if I had, I'm not sure how to search for that kind of detail in the standard. So I'm not certain of the above, but pretty sure.

Doc CPU · 01-02-2012, 11:22 AM

Hi there,

Quote:

Originally Posted by ArthurSittler

Regarding the original question about evaluation order of function arguments, I see issues in the example code. I am pretty sure that the postincrement notation a++ in

Code:

int a = 4;
int b = a++ + 3*a;

explicitly specifies that the increment of a is done after evaluation of the entire statement.

this is not what I thought - I expected the increment to be done immediately after reading the variable value inside the parent expression. But we're both wrong. I remember that a few years ago, someone in a programming forum pointed me to a paragraph in the C99 standard which explicitly stated that in an expression that contains a compound assignment (*=, -=, but also the pre- and postincrement operators), a subsequent occurrence of the same variable in that expression produces an undefined result and should be avoided.
Pity I can't find that reference right now.

Quote:

Originally Posted by ArthurSittler

My other comment is that writing function parameters with interacting side effects makes it harder to read than explicitly performing the operations separately in multiple statements, using additional variables as necessary.

I disagree. I'm following the principle: One step in mind == one line of code.
That implies complex statements sometimes, but I find it harder to read when instructions that make up a quasi-atomic operation are torn apart. Extra variables usually obfuscate the connection. If they're just supposed to be a temporary result that is used again in the next statement (and nowhere else), I'd rather nest the two statements and eliminate the extra variable.

Quote:

Originally Posted by ArthurSittler

When one writes computer code, one is not writing code for the compiler. One is writing for the next human who will be reading the code (which may well be the original author).

When I write program code, it's me most of the time that has to read and extend that code later. But on those rare occasions that other people had to (or wanted to) deal with my code, I received positive feedback on my coding style in terms of readability. Generous comments may be a contribution to that.

Quote:

Originally Posted by ArthurSittler

If we were only writing for the computer, we could write machine code in hexadecimal. Simplicity and clarity for the human reader is the goal ...

If we were consequent about that, we'd be writing assembly language. There's nothing simpler and clearer - in my eyes.

Quote:

Originally Posted by ArthurSittler

... not cleverness or some notion that you are optimizing the way the compiled code will execute.

But I made the experience that the more a piece of code (we're still talking about C code, are we?) is optimized, the better it is to understand.

[X] Doc CPU

hydraMax · 01-02-2012, 04:27 PM

I noticed this in c99 (5.1.2.3):

Quote:

Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects,11) which are changes in the state of
the execution environment. Evaluation of an expression may produce side effects. At
certain specified points in the execution sequence called sequence points, all side effects
of previous evaluations shall be complete and no side effects of subsequent evaluations
shall have taken place. (A summary of the sequence points is given in annex C.)
<snip>
EXAMPLE 7 The grouping of an expression does not completely determine its evaluation. In the
following fragment

Code:

#include <stdio.h>
int sum;
char *p;
/* ... */
sum = sum * 10 - '0' + (*p++ = getchar());

the expression statement is grouped as if it were written as

Code:

sum = (((sum * 10) - '0') + ((*(p++)) = (getchar())));

but the actual increment of p can occur at any time between the previous sequence point and the next
sequence point (the

, and the call to getchar can occur at any point prior to the need of its returned
value.

ArthurSittler · 01-03-2012, 01:47 AM

Apologies -- I stand corrected. The postincrement can be done any time after its value was used in the expression. In this case, any time after the old value of a was fetched.
This makes sense, because in some architectures it is possible to increment or decrement some registers without using the ALU. The CPU uses counters, not just storage latches, for the registers.