LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-27-2009, 05:46 AM   #1
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Rep: Reputation: 70
Exclamation fgetc() clarification


Hi all,

I've been revising my rusty old C knowledge from way back when and everything was going swimingly until I came across this explanation for the way fgetc() works:

Quote:
The fgetc function returns an integer. What this actually means is that when it reads a normal character in the file, it will convert that character into a value suitable for storing in an unsigned char (basically, a number in the range 0 to 255).
This doesn't make much sense to me. I don't want the function to perform some translation from char to int; I just want it to extract whatever alphanumeric ascii text character is in the source file and that's all. i don't want to have to later use a chart to find what numbers fgetc() has converted the source file text into! What's going on here?
 
Old 08-27-2009, 06:11 AM   #2
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
You don't have to use charts or whatever. The letter 'A' is internally stored as 0x41 (equals 65 decimal). So a simple cast of an integer to char (see bold below) will do the trick.

Code:
#include <stdio.h>


int main()
{
int chi;
char chc;

        printf("enter char : ");
        chi = fgetc(stdin);

        printf("chi: %04d | %04x | %c\n",chi,chi,(char)chi);

        chc=(char)chi;
        printf("chc: %04d | %04x | %c\n",chc,chc,chc);

        return 0;

}
You can also directly cast
Code:
        chc=(char)fgetc(stdin);
and you probably don't even have to (not tested).

Last edited by Wim Sturkenboom; 08-27-2009 at 06:12 AM.
 
Old 08-27-2009, 06:37 AM   #3
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Original Poster
Rep: Reputation: 70
I admit I find it tough to think clearly when the temperature climbs towards 90 as it is yet again today, so you'll have to excuse my stupidity.
I don't understand why this function fetches a char, stores it as an int; returns an int that has to be cast back into a char?? It makes no sense. Why doesn't fgetc() just return the face-value char it finds in the file?
 
Old 08-27-2009, 06:41 AM   #4
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Original Poster
Rep: Reputation: 70
Or is it just for more efficient storage they do it this way? I can't think of any other possible reason.
 
Old 08-27-2009, 06:54 AM   #5
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
fgetc returns EOF (-1) on eof or error which would be the character with value 255 (decimal) if it would return a char. So the int allows to return values different from 0-255.

PS In which case casting without checking the return value as in my last piece of code is a bit dangerous).

Last edited by Wim Sturkenboom; 08-27-2009 at 06:56 AM.
 
Old 08-27-2009, 07:15 AM   #6
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Perhaps I am very confused, but my understanding was that deep down the type char in C is a numeric, integer type. To quote Beginning C by Ivor Horton:

Quote:
A variable of type char can store the code for a single character. Because it stores a character code, which is an integer, it's considered to be an integer type (page 59).
When you program, you don't need to do any conversion. What you see will depend on the format string you use with printf. Consider the following silly script:

Code:
#include <stdio.h>

int main (void) {
    int c;
    
    while ((c = getchar()) != EOF) {
        printf("You said [%c].\n", c);
        printf("But to me [%c] is always [%d].\n", c, c);
    }

    return 0;
}
Note that I use the exact same variable (c), which I declared as an int, both as a character and as a number. No casting or conversion is necessary. If I want the character, I use the format %c. If I want the number, I use %d. Here's a simple file with a few letters and the output of the script (compiled as silly).

Code:
telemachus practice $ cat letters 
a b j Q r

telemachus practice $ ./silly < letters
You said [a].
But to me [a] is always [97].
You said [ ].
But to me [ ] is always [32].
You said [b].
But to me [b] is always [98].
You said [ ].
But to me [ ] is always [32].
You said [j].
But to me [j] is always [106].
You said [ ].
But to me [ ] is always [32].
You said [Q].
But to me [Q] is always [81].
You said [ ].
But to me [ ] is always [32].
You said [r].
But to me [r] is always [114].
You said [
].
But to me [
] is always [10].
Notice that the script shows spaces and newlines in addition to letters. (They are characters too, even if not as obviously as 'a' or 'Q'.) The character code for ' ' (a space) is apparently 32 and a newline is 10. I believe that all these are ASCII standard, but I would need to check that to be sure. I checked on Wikipedia. These are standard.

Last edited by Telemachos; 08-27-2009 at 07:20 AM.
 
Old 08-27-2009, 09:34 AM   #7
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
In my opinion, being able to store an integer value in a variable does not make it of type int. And you need to take the type into account (and cast at occasion) if you want to assign the result of e.g. the multiplication of two chars to an int. C will complain if you don't.

A char is 8 bits and an int depends on the 'size' of the machine (16 bits, 32 bits, 64 bits).
 
Old 08-27-2009, 09:56 AM   #8
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Quote:
Originally Posted by Wim Sturkenboom View Post
In my opinion, being able to store an integer value in a variable does not make it of type int. And you need to take the type into account (and cast at occasion) if you want to assign the result of e.g. the multiplication of two chars to an int. C will complain if you don't.

A char is 8 bits and an int depends on the 'size' of the machine (16 bits, 32 bits, 64 bits).
You are probably right about the type more formally. Certainly a char will be one byte, as opposed to an int or other type. But characters are stored as small integers, so the automatic conversion is handy. I meant simply that for normal reading and printing of characters, you should not ever need to do any manual conversion (which the OP seemed worried about). (I can't think of a circumstance where I would need to multiply two chars. When would you want to do that?)

This is from The C Programming Language by Kernighan and Ritchie:
Quote:
What appears to be a character on the keyboard or screen is of course, like everything else, stored internally just as a bit pattern. The type char is specifically meant for storing such character data, but any integer type can be used. We used int for a subtle but important reason.

The problem is distinguishing the end of the input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF, for "end of file." We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char value. Therefore we use int. (page 16 - They are referring to code that I didn't quote when they say "We used..." and when they mention specific variables.)
If it's good enough for K&R, it's good enough for me. (This is the same explanation that you gave to the OP, I think, for why fgetc works as it does. I'm just adding that you can rely on automatic conversions via format strings (%c vs. %d for most normal cases. You don't need to do any casting normally.)

Last edited by Telemachos; 08-27-2009 at 10:00 AM.
 
Old 08-27-2009, 10:36 AM   #9
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
Your quote from K&R is indeed what I said / tried to say in post #5.

As a type char can be anything, I store small integers in there when it suites the application; and those I might have to multiply with a result that doesn't fit in a variable of type char. It's probably better to use another type or typedef but I usually don't care. Only thing I care about will be signed/unsigned.
 
Old 08-27-2009, 12:59 PM   #10
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Original Poster
Rep: Reputation: 70
Quote:
Originally Posted by Telemachos View Post
For most normal cases. You don't need to do any casting normally.)
Thanks, all.

Telemachos, do you mean to say that with *some* functions one needs to cast-type, yet with *others* it's not necessary?
That would make more sense to me than anything I've read elsewhere on the subject.
So to clarify, what you're saying is that various more advanced stock C library functions automatically handle the datatype for the user 'invisibly' without the user having to worry about it; whereas other, maybe older functions aren't so 'intelligent' and have to be manually 'told' how to deal with each datatype?
 
Old 08-27-2009, 03:06 PM   #11
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Quote:
Originally Posted by Completely Clueless View Post
Telemachos, do you mean to say that with *some* functions one needs to cast-type, yet with *others* it's not necessary?
That would make more sense to me than anything I've read elsewhere on the subject.
So to clarify, what you're saying is that various more advanced stock C library functions automatically handle the datatype for the user 'invisibly' without the user having to worry about it; whereas other, maybe older functions aren't so 'intelligent' and have to be manually 'told' how to deal with each datatype?
Nope, I didn't really mean that. In my experience, you never need to use a cast. However, I am a novice in C myself, and Wim mentioned a few cases that I haven't come across where maybe you would want to cast (though I admit, I still don't know why I would ever want to multiply a character).

I didn't have in mind any contrast between advanced, intelligent functions versus older, less intelligent C functions or libraries. I'm not aware of any such distinction.

The documentation you read told you how fgetc works. You say you don't want it to get a number and then convert it into a character; you want it to get a character directly. My response is that computers don't know characters; they know numbers. That's the bad news, I guess, since you can't have what you want. However, the good news is that it's ok anyhow. You will not have to work from charts or do any laborious translation. Just use the integer as a character, as I showed in my silly script, and you should be fine.

Last edited by Telemachos; 08-27-2009 at 03:09 PM.
 
Old 08-27-2009, 11:31 PM   #12
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
If you read that Wikipedia link from Telemachos, it explains it pretty clearly, with tables of the various char's, ctrl chars and how they are actually represented internally.
In short, fgetc() will probably do what you want, until you get into more exotic requirements.

'It's numbers all the way down...'
 
Old 08-28-2009, 04:27 AM   #13
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Original Poster
Rep: Reputation: 70
Quote:
Originally Posted by Telemachos View Post
Nope, I didn't really mean that. In my experience, you never need to use a cast. However, I am a novice in C myself, and Wim mentioned a few cases that I haven't come across where maybe you would want to cast
Ok., I think I have the picture now. It's regrettable that I've moved many times in the last 17 years and couldn't take all my C books with me or I'd have figured this out for myself.
As for casting, I'm not sure either why Wim mentions it at all. As far as I'm aware, the only time the issue arises is with generic (void) pointers which can be switched from one datatype to another via casting. The question I posted has nothing to do with pointers of any sort, however.
Thanks to everyone who contributed to this thread, by the way.
 
Old 08-28-2009, 09:38 AM   #14
Completely Clueless
Member
 
Registered: Mar 2008
Location: Marbella, Spain
Distribution: Many and various...
Posts: 899

Original Poster
Rep: Reputation: 70
A bit of serendipity. I came across one of the online C tutorials concerning this very point. Here's what it says:

Quote:
"If getc() and fgetc() return a single character, why are they prototyped to return a type int? The reason is that, when reading files, you need to be able to read in the end-of-file marker, which on some systems isn't a type char but a type int."
Just thought I'd include this explanation in this thread in case some other poor, mystified soul subsequently seeks it out.
 
Old 08-28-2009, 07:50 PM   #15
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
Quote:
Originally Posted by Completely Clueless View Post
Just thought I'd include this explanation in this thread in case some other poor, mystified soul subsequently seeks it out.
Wim told you exactly this in post 5. I quoted K&R (the creators of C) saying the same thing in post 8.

Which thread have you been reading?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
fgetc failed for russian char z amit_pansuria Programming 4 03-13-2009 07:08 AM
Segmentation Error with fgetc function SSJVEGETA Programming 3 04-28-2008 12:26 AM
c: go back 1 char (opposite of fgetc()) schneidz Programming 7 10-30-2007 08:05 PM
fgetc - segmentation fault schneidz Programming 9 06-28-2006 02:35 PM
fscanf & fgetc help billybob2004 Programming 2 02-04-2004 10:24 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration