LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-13-2009, 01:49 AM   #1
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Rep: Reputation: 53
C string as an array of chars and as a pointer to char


Please look at the comments
Code:
/*
 * TEST CASE TO CHECK DIFFERENCES BETWEEN STRING AS
 * AN ARRAY OF CHARS AND AS A POINTER TO CHAR
 */

#include  <stdio.h>
//#include  <string.h>

int main(void)
{
	char a[] = "foo";
	char *b  = "bar";
	char * const c = "changeme";

	printf("a - %s %c %p\n", a, *a, a);
	printf("b - %s %c %p\n", b, *b, b);
	//printf("c - %s %c %p\n", c, *c, c);

	//c = "ok";   //compiler error
	//*c = 's';   //compiles, but segfaults when executing (why?)
	//c[1] = 't'; //compiles, but segfaults when executing (why?)
	
	b = "qwe"; //b lost previous address

	/*
	 * THESE DO NOT WORK
	 */
	//b[] = "zxc"; //compiler error
	//a = "dfg";   //compiler error
	//a[] = "rty"; //compiler error

	*a = "cvb"; //have no idea, what a hell it does (after xxd it seems it's writing MSB or LSB of &"cvb")

	printf("a - %s %c %p\n", a, *a, a);
	printf("b - %s %c %p\n", b, *b, b);
	
	/*
	 * THESE DO NOT WORK
	 */
	//b[0] = 't';  //compiles, but segfaults when executing (why?)
	//*b = 'b';    //compiles, but segfaults when executing (why?)

	b = a;

	printf("b - %s %c %p\n", b, *b, b);

	b[2] = 't';
	*(b+1) = 'b';
	*(++b) = 'n'; 
	--b;

	printf("b - %s %c %p\n", b, *b, b);

	a[0] = 'z';
	*(a+1) = 'e';
	//*(++a) = 'r'; //compiler error (expected)

	printf("b - %s %c %p\n", a, *a, a);

	return 0;
}
Why do I get such a strange output?
Code:
a - foo f 0xbff204b8
b - bar b 0x80485e0
a - 
oo 
 0xbff204b8
b - qwe q 0x8048609
b - 
oo 
 0xbff204b8
b - 
nt 
 0xbff204b8
b - zet z 0xbff204b8
Tested using gcc (GCC) 4.3.3.
Please elaborate. I also would like links explaining these disrepancies deeply.

Last edited by Alien_Hominid; 05-13-2009 at 01:51 AM.
 
Old 05-13-2009, 05:56 AM   #2
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 43
When you create a string using a literal and assign it to a char *, the actual data goes into the data segment of the binary and thus is in read-only memory, so modifying it is erroneous. However, if you call it a char array, it's more like saying:
Code:
char a[] = {'f', 'o', 'o'};
Where it's perfectly valid to change the array members. Observe this example:
Code:
#include <stdio.h>

int main(int argc, char **argv) {
        char *s = "dragonforce";
        printf("Address(s)   = 0x%08X\n", &s);
        printf("Value(s)     = %s\n", s);
        s[7] = 'a';
        printf("New Value(s) = %s\n", s);
        return 0;
}
And here's a debugging session:
Code:
(gdb) break 7
Breakpoint 1 at 0x1c000722: file test.c, line 7.
(gdb) run
Starting program: /home/taylor/test 
Address(s)   = 0xCFBF015C
Value(s)     = dragonforce

Breakpoint 1, main (argc=1, argv=0xcfbf01dc) at test.c:7
7               s[7] = 'a';
(gdb) print &s[7]
$1 = 0x3c000008 "orce"
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x1c000728 in main (argc=1, argv=0xcfbf01dc) at test.c:7
7               s[7] = 'a';
(gdb) The program is running.  Exit anyway? (y or n) y
Notice the location of the string data is way far away from the address of s. This covers both the block where you assign to various parts of c and the block where you assign to various parts of b. Also, check this document out: http://www.lysator.liu.se/c/c-faq/c-2.html

In the middle block:
Code:
b[] = "zxc";  /* invalid syntax */
a = "dfg";    /* type mismatch, char * vs. char[] */
a[] = "rty";  /* invalid syntax */

Last edited by taylor_venable; 05-13-2009 at 05:58 AM. Reason: missing code block
 
Old 05-13-2009, 07:27 AM   #3
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Original Poster
Rep: Reputation: 53
Ok, great explanation, thanks (also for the link).
I'm not so worried about cases where compiler produces errors (yet, these are interesting) but about those where error is left unnoticed:
Code:
        *a = "cvb"; //have no idea, what a hell it does (after xxd it seems it's writing MSB or LSB of &"cvb")
	 /*
	 * THESE DO NOT WORK
	 */
	//b[0] = 't';  //compiles, but segfaults when executing (why?)
	//*b = 'b';    //compiles, but segfaults when executing (why?)

Last edited by Alien_Hominid; 05-13-2009 at 07:29 AM.
 
Old 05-13-2009, 07:52 AM   #4
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by Alien_Hominid View Post
*a = "cvb"; //have no idea, what a hell it does (after xxd it seems it's writing MSB or LSB of &"cvb")
I don't understand "after xxd". But otherwise, you are correct. That instruction says to overwrite the first character pointed to by a (the 'f') with the LSB of the address of "cvb".

Quote:
//b[0] = 't'; //compiles, but segfaults when executing (why?)
//*b = 'b'; //compiles, but segfaults when executing (why?)
It is an original design flaw in the C language that you can use a char* to point to quoted text, rather than needing a char const*

The contents of quoted text must not be modified. There may (or may not) be run time enforcement (the segfault) for the rule that quoted text must not be modified. But either way it is a bug to modify quoted text.

Code:
	char a[] = "foo";
That allocates a char[4] buffer on the stack and copies {'f', 'o', 'o', 0} into that buffer.

Code:
	char *b  = "bar";
Makes a pointer (which can be changed) to text, which cannot be changed, but the compiler is effectively told to ignore the fact that the text cannot be changed.

Code:
	char * const c = "changeme";
Makes a pointer which cannot be changed to text, which the compiler is told to pretend can be changed.

Code:
c = "ok";   //compiler error
Try to change a pointer which you declared cannot be changed.

Code:
*c = 's';   //compiles, but segfaults when executing (why?)
or
Code:
c[1] = 't'; //compiles, but segfaults when executing (why?)
Try to change text that you declared as being changeable but it isn't.

Code:
	b = "qwe"; //b lost previous address
Change a pointer. No problem.

Code:
b[] = "zxc"; //compiler error
C has support for that kind of copy only on the line defining a char array, not as a later executable action.

Code:
a = "dfg";   //compiler error
a is an address, not a pointer. An address relates to a pointer the same way a number relates to an int variable. Consider
Code:
int x=5;
int u=x;  // Can use x (an int variable) the way we might use a number
int v=7;  // Can use 7 (a number) as a number
x = 4;    // Can change an int variable to have a new value.
7 = 4;    // Cannot change a number to have a new value.
The above is obvious and doesn't confuse any beginners. But the corresponding similarity/difference between and address (such as a in your code) and a pointer (such as b) confuses most beginners.

Last edited by johnsfine; 05-13-2009 at 08:09 AM.
 
Old 05-13-2009, 10:58 AM   #5
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Original Poster
Rep: Reputation: 53
I xxd'ed output to check values of those bytes.
Quote:
Originally Posted by johnsfine
The contents of quoted text must not be modified. There may (or may not) be run time enforcement (the segfault) for the rule that quoted text must not be modified. But either way it is a bug to modify quoted text.
Shouldn't compiler check and produce error for all 3 previous cases which compiles but either later segfaults or are, imho, useless (allowing to place LSB of an address into memory pointed to by array name)?

Pointer holds an address the same way as array's name points to it's location. The only difference it seems is that they point to different memory locations, therefore one can be changed and the other can't. Consequently, the question arises if this behaviour is inherent C problem (not defined in C standard) or some sort of problem in compiler allowing things, which shouldn't be allowed.

EDIT: Had removed false assumptions before anyone responded.

Last edited by Alien_Hominid; 05-13-2009 at 12:12 PM.
 
Old 05-13-2009, 11:59 AM   #6
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
First, a caveat: When I first started programming (in the middle of the last century) the only programming language available was assembly. (Well, I did some "programming" by moving wires on "programming boards," but not very often.) So I may be prejudiced by my experience during my formative years.

With that caveat, I think a lot of the "pointer / value" confusion some people seem to have might be reduced if they took the time to learn at least the basics of assembly language.

Anyhow, it should be easy to remember that a "pointer," p, refers to a specific location in your computer's RAM, and the "value," *p, refers to whatever is stored in RAM at that location. (And, of course, a "reference," &p, to a value is the address of the RAM where the value is stored.

Anyhow, that's my for the above discussion.
 
Old 05-13-2009, 12:04 PM   #7
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Original Poster
Rep: Reputation: 53
I have some basics in i386 assembly (therefore I would like to able to modify all memory made available for the program ). The thing I got confused is that one is allowed to modify memory values in compilers standpoint (why?) but not in reality (segfault). Anyway, thanks costs nothing.
 
Old 05-13-2009, 12:27 PM   #8
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by Alien_Hominid View Post
Shouldn't compiler check and produce error for all 3 previous cases which compiles but either later segfaults or are, imho, useless
Once a flaw in language design has been in place for many years, it is very hard for the compiler to usefully improve the situation. Consider the following code (from your own example):

Code:
	char a[] = "foo";
	char *b  = "bar";
. . .
	b = a;
. . .
	b[2] = 't';
b starts out pointing to text that must not be changed, with the declaration telling the compiler that b points to text that can be changed.

Later b is changed to point to text that can be changed. Note that it isn't possible to change the declaration of b there, only where it points.

Finally b is used to modify part of the text it points to.

All together, those steps are correct and sequences like that happen in many correct programs. A single pointer variable might be:
1) Used by a sections of the code that don't modify the contents
2) Set in some places to text that can't be modified
3) In other places set to text that can be modified and then actually modified.

Sections 2 and 3 must obviously be disjoint enough that they don't trip over each other, but each might be so well connected to 1 that there is no clean place to make a different declaration for the modifiable text vs. non modifiable. Most of us would consider that combination at least unfortunate if not absolutely bad style. But it still happens in enough old C code to be a problem for a compiler rejecting the assignment of a quoted string to a char*.
 
Old 05-13-2009, 03:10 PM   #9
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Quote:
Originally Posted by Alien_Hominid View Post
I have some basics in i386 assembly (therefore I would like to able to modify all memory made available for the program ). The thing I got confused is that one is allowed to modify memory values in compilers standpoint (why?) but not in reality (segfault). Anyway, thanks costs nothing.
Ah, well, that gets into the issue of "protected" and "unprotected" memory, and (as johnsfine mentioned) allocation of memory in the stack.

You can, in fact, modify the contents of all unprotected memory allocated by your program. But static strings are allocated in protected (and shared) memory, and, therefore, can't be modified by your program. (The point is that many programs declare the same strings and constants in different places, and the actual physical size of the program can be reduced by reusing those definitions. But this optimisation fails if the constant or string can be changed.) When I programed in "B" (the precursor to "C"), I needed to be very cautious making assignments since B had no data types, and all RAM was modifiable by any program. (For amusement we liked to write self-modifying programs, where execution of the code resulted in a different program being run. That sort of thing is fine for a single-user system, but not so "cool" when someone else is trying to use the hardware to get some "real work" done.)

So the current use of "segments," some of which are static and some modifiable, is a vast improvement over the "have at it" days of yore.

Bottom line: Some program data (often, most data) is allocated to static segments, and an attempt to modify the contents of a static segment causes the "seg fault."

So, to reiterate, if you want be able to change values in RAM, those values must be declared in such a way that they are located in an unprotected memory segment. One way to do that (in C) is to explicitly reserve dynamic (i.e., modifiable) RAM for the value with the malloc - or similar - function. For numeric values, a simple <type> name; suffices, but arrays - especially dynamically sized arrays - need more work.
 
Old 05-13-2009, 04:01 PM   #10
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 43
You can't check it in the compiler because the compiler simply doesn't have all the information required. Note the example of johnsfine above. How could the compiler know if the string you're assigning into is declared extern? It can't, thus it doesn't check. Separate compilation FTW.
 
Old 05-13-2009, 04:49 PM   #11
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
Quote:
Originally Posted by PTrenholme View Post
static strings are allocated in protected (and shared) memory,
Quoted strings usually are allocated in protected shareable memory.

Memory protection is managed on 4K byte boundaries. I don't think the linker is required to waste memory up to the next 4K byte boundary when the protection requirements change for the next link time allocation. So each 4K byte block must have the least protection of anything allocated in that block. So I think a quoted string might be allocated in the same 4Kb block with a compile-time initialized writable global variable, in which case writing to that text would not seg fault.

Obviously you shouldn't overwrite quoted text and you shouldn't be surprised when doing so seg faults. But unless you have taken more specific control of these link time issues, you shouldn't rely on that seg fault.
 
Old 05-14-2009, 11:59 AM   #12
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by PTrenholme View Post
With that caveat, I think a lot of the "pointer / value" confusion some people seem to have might be reduced if they took the time to learn at least the basics of assembly language.
Couldn't agree more...

With respect to protected vs. unprotected memory and storage of literal strings there, one should consider also that C can be used to produce code which runs from various forms of Read-Only-Memory. There, the literal strings are electronically immutable, so trying to write to memory that is mapped as ROM/PROM/EPROM/EEPROM may fail (or not, perhaps) in various ways. Thinking about the situation in these terms can help clarify the reasons for the behavior of the compiler and the runtime code.
--- rod.
 
Old 05-14-2009, 05:08 PM   #13
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Original Poster
Rep: Reputation: 53
Then there should be some switch in gcc to tell where to place literal strings.
http://www.lysator.liu.se/c/c-faq/c-17.html#17-20
 
Old 05-15-2009, 05:01 PM   #14
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 78
Quote:
Originally Posted by Alien_Hominid View Post
Then there should be some switch in gcc to tell where to place literal strings.
http://www.lysator.liu.se/c/c-faq/c-17.html#17-20
GCC had (until the 4.x branch) a flag -fwrite-strings (or something like that), which would allow backwards compatibility with K&R C (which didn’t specifically forbid writing to string literals). There is currently the warning flag -Wwrite-strings which will emit warning for such uses.

Additionally, there are some architectures which are targets for gcc which don’t have a read-only data segment, and on which funny things can happen.
 
Old 05-17-2009, 12:35 PM   #15
Alien_Hominid
Senior Member
 
Registered: Oct 2005
Location: Lithuania
Distribution: Hybrid
Posts: 2,247

Original Poster
Rep: Reputation: 53
Thanks too.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C# convert char array to string exodist Programming 3 09-16-2008 08:06 AM
How can I assign to a pointer array like char *args[]; ? haydari Programming 3 04-09-2007 11:48 PM
Convert C++ String Vector to char array anamericanjoe Programming 1 12-12-2006 09:29 PM
char array of size 10 can read morethan 10 chars!!!!! pippet Programming 13 07-12-2004 01:44 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration