Why are arrays in C pointers?

MTK358 · 03-27-2010, 08:46 AM

I wonder why arrays in the C programming language are pointers to the first element of the array, not the first element of the array itself?

H_TeXMeX_H · 03-27-2010, 08:51 AM

Arrays in C are just consecutive allocated memory addresses, so all you need to access the array is a pointer to the first element, then you increment to get to the rest. And that's why it seg faults if you go over the number of allocated elements in the array (at least on Linux, on Window$ it probably crashes the system).

I'm not exactly sure what you are asking, but you must have a pointer (memory address) to the first element for the above stated reason, otherwise if it's just the value at a memory address that won't do you any good, because you can't access the rest of the array using just this.

MTK358 · 03-27-2010, 08:54 AM

Why is it a pointer to the first element?

Why is *(array+1) the 2nd element, not *((&array)+1)?

theNbomr · 03-27-2010, 12:59 PM

Addressing an array by its first (zeroth?) element is merely a convenient shorthand provided by the language. If you want to be completely pedantic, you could always specify an array as

Code:

  &(array[0])

but this seems less readable. It is a great construct of the language, but is a significant stumbling block for those learning the language, especially for someone unacquainted with lower level machine architecture and principles that are learned in the likes of assembly language programming.
---- rod.

nadroj · 03-27-2010, 01:06 PM

Not sure if this will help, but I know it was useful to me (which is why I bookmarked it a while ago): http://c-faq.com/aryptr/index.html.

MTK358 · 03-27-2010, 01:08 PM

You still don't understand my question.

Why is it like this:

Code:

Stack:

int a
int* array --\
char c       |
...          |
             |
Heap:        |
...          |
element 0 <--/
element 1
element 2
...

Not like this:

Code:

Stack:

int a
int array (element 0)
int (element 1 of array)
int (element 2 of array)
char c

And why does the array[n] operator do this:

Code:

*(array+n)

instead of this:

Code:

*((&array)+n)

nadroj · 03-27-2010, 01:11 PM

If you allocate an array dynamically, then it is all on the heap (this is the purpose of the heap). If you allocate the array statically, then it is all on the stack. This is true, of course, unless you find a reliable source saying otherwise. This is how I've always understood it.

nadroj · 03-27-2010, 01:16 PM

Quote:

Originally Posted by MTK358

And why does the array[n] operator do this:

Code:

*(array+n)

instead of this:

Code:

*((&array)+n)

Because "array" is already a pointer, you don't care where its stored ("&array"), you only care what it points to. Say "array" points to address 5. The second element is then the value pointed to by address 5+2=7, or in other words "*(array+2)".

Note: Some things above are stated for simplicity, versus correctness.

MTK358 · 03-27-2010, 01:20 PM

Quote:

Originally Posted by nadroj

Because "array" is already a pointer, you don't care where its stored ("&array"), you only care what it points to. Say "array" points to address 5. The second element is then the value pointed to by address 5+2=7, or in other words "*(array+2)".

Note: Some things above are stated for simplicity, versus correctness.

What if it wasn't a pointer? What if arrays in C were plain variables?

I am not talking about C programming techniques, but why is the C language designed this way in the first place?

smeezekitty · 03-27-2010, 01:31 PM

Quote:

Window$ it probably crashes the system

No, it does not.
Instead spits out an error that does not pertain to a SEGFAULT: xxx.exe has stoped working.

nadroj · 03-27-2010, 01:31 PM

I dont know the specification details or the reasoning for the design decisions made in the language. I imagine they aren't "plain variables" because then you don't know where the next element is. Arrays elements are, of course, sequential in memory. This makes working with the elements ("array arithmetic") very easy. If arrays were "plain variables", then basically for an array of size n, you have n different variables, say

Code:

int i0;
int i1;
// ...

If you say "well, no, it would just be one variable", then your basically saying/understanding why it should be the way it is now, with one variable (ie, "int array[n];"). It is a single variable, which is just the first element in the array. The added benefit is that you can determine the addresses of the other elements in the array with simple math.

I can't really explain what I'm trying to say here, well. What I'm mainly trying to say that its like this because it actually simplifies things. I imagine it might also make some underlying memory management stuff relatively simpler. For example, when you request an array of 5 ints, your requesting, say 5x4 = 20 bytes. The OS then just has to look for a space of 20 continuous bytes of free memory, which is basically a constant time calculation (O(1)). This is opposed to a linear time calculation of O(n). Of course there are other good reasons, I imagine.

MTK358 · 03-27-2010, 01:36 PM

I think I know why an erray is a pointer now.

It's probably because not all CPU architectures pass function parameters sequentially on the stack, eliminating the ability of passing an array to a function.

But OTOH, why not manually make a pointer of the array and pass it to the function?

(Unless some CPU architectures don't even store stack variables sequentially).

But again, maybe C's design makes it easier to implement dynamic arrays?

nadroj · 03-27-2010, 01:51 PM

I don't understand what the CPU architecture has to do with it. Also, I think all function parameters are always "passed on the stack". When a function is called, the instruction after the instruction to call this function is pushed on the stack. Next, all the parameters to the function are pushed on the stack. Then control enters the function, and the parameters are popped off the stack, and the top of the stack is now the address of the instruction to execute after this function is done. Next the function executes and returns (control). The OS knows where to return control to, because that address at the top of the stack.

("entire") Arrays are never passed to functions. As stated many times here, an "array" variable is just a variable storing the address of the first element in the array. This is the variable that's passed to any function, not the "entire array" itself, as that is horribly inefficient.

The link I posted above wasn't for show. You certainly could not have read it all between the time I posted it and the time you said I don't understand what your asking. I posted it because it is very informative, and if you go through it all, you will probably get a better understanding.

MTK358 · 03-27-2010, 01:55 PM

I thought that x86_64 and ARM among others pass some parameters in registers and some on the stack based on certain rules.

nadroj · 03-27-2010, 02:07 PM

I dont know anything about specific architectures. If you are doing ASM programming, I know that system calls require you to manually put variables in certain registers before you call the function. In user-defined functions I think you push them on the stack. But I dont have enough experience in that to give any facts.

But I still dont think that the decision is based solely on CPU architectures, but that it was done for simplicity and ease of use/maintenance. I think there are two major paths to go from here, to find what your looking to understand. Theres the low-level answer/reasons, involving ASM and hardware, and the higher-level annswer/reasons, involving languages like C.