[SOLVED] What is so special about 34 or more spaces when reading text files with C code?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If you are still struggling with what astrogeek is saying, perhaps considering this variation may help:
Code:
for ( i = 0; i < ' '; i++ ) {
printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
if ( content[i] == '#' ) {
str = false;
printf("found #\n");
continue;
}
}
//We want to see the comparison AFTER the loop has exited - in other words why it ended
printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Original Poster
Rep:
Thanks again guys!
I agree with astrogeek in that; it's probably best to try and understand what exactly my for loop is doing, so I'll hold off on any improvements until then.
I run your example GazL, and I think I get what you mean. I think I managed to put it in a for loop, and it does print out the same output, bar the null.
Here's that code anyways;
Code:
#include <stdlib.h>
#include <stdio.h>
int main()
{
char string[] = { 'A', 'B', 'C', ' ', '\0' };
for ( int i =0; string[i] != '\0'; ++i ) {
printf("string[i] = %1$c = %1$d\n", string[i]);
}
return EXIT_SUCCESS;
}
And this is the output of that code;
Code:
james@jamespc: practice> ./ascii_example_for_loop
string[i] = A = 65
string[i] = B = 66
string[i] = C = 67
string[i] = = 32
astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ? If that's not what's happening, I buggered if I know. Is that at least somewhere near what's happening, and why the loop is terminating at 32?
Thanks for your help also phil.d.g, but astro's printf() output was a little easier to grasp, although from what I can see, your variation only changes the loop condition to a space, I think...
Thanks for your help also phil.d.g, but astro's printf() output was a little easier to grasp, although from what I can see, your variation only changes the loop condition to a space, I think...
You are correct. I was less concerned with what it output, rather try it a little more obvious that the upper bound of the for loop was a char. Which...
Quote:
Originally Posted by jsbjsb001
astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ? If that's not what's happening, I buggered if I know. Is that at least somewhere near what's happening, and why the loop is terminating at 32?
You're starting to grasp.
Now consider that computers only know binary. We assign meaning to binary data by using types, for example char, int, bool, etc. That's only useful for us, not the computer.
As far as the computer is concerned:
' ' is 0010 0000
32 is 0010 0000
Your for loop is checking that i is less than 0010 0000.
There is a compounding part to your problem in that your upper bound for the for loop is a moving target. Think about what might happen if you had the word 'hello' starting at column 30, rather than a line full of space characters.
I run your example GazL, and I think I get what you mean. I think I managed to put it in a for loop, and it does print out the same output, bar the null.
Here's that code anyways;
Code:
#include <stdlib.h>
#include <stdio.h>
int main()
{
char string[] = { 'A', 'B', 'C', ' ', '\0' };
for ( int i =0; string[i] != '\0'; ++i ) {
printf("string[i] = %1$c = %1$d\n", string[i]);
}
return EXIT_SUCCESS;
}
Bingo!
The way to iterate over a character string of variable/unknown length -- admittedly my example was a known fixed length -- is to keep going until you find a '\0' (NUL) character.
Though I created the char array 'string' char by char, when you write char string[] = "ABC "; the compiler puts a '\0' at the end for you, so it's the same thing. In your program. the fgets() puts a '\0' on the end when it turns your input characters into a string. fgets() also leaves a '\n' in there that will need to be dealt with, but we can deal with that later.
Your explanations is on the right track, though your terminology is a little off. In your program, your condition i < content[i] says compare the value i (the index into the string of the current character) to content[i] (the ascii value of the character at that index position). If content[32] is a space (ascii 32) or any character before it in the ascii table -- there aren't any printable ones -- your loop will stop. If content[65] is a 'A'(ascii 65) or any ascii character before it in the ascii table, your loop will stop -- assuming of course that it even got as far as index 65 -- and so on.
So, now you should understand why it's broken, and how to fix it.
Should you need any help, we can come back to dealing with removing comments and the unwanted '\n' left behind by fgets() once you've fixed what you already have, but see if you can work that out yourself. It'll be good practice.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Original Poster
Rep:
Thanks GazL! I was hoping I got that right...
Anyhow, I think I have fixed the "32 spaces problem" now, but my code doesn't deal with the "newline problem". And for the life of me, I cannot figure out how to stop it ignoring a string that ISN'T proceeded by a hash, for example "no comment#" should not be ignored, but on the other hand, "#a comment" SHOULD be ignored as a comment - but it still ignores any line with a hash in it, regardless of where the hash is on the line.
Anyway, this is what my code looks like now;
Code:
// a program to skip to the next line in file if the comment char (#) is encountered
#define CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void) {
char content[CONTENT_LEN];
char filename[10] = "test.txt";
bool str = true;
int i = 0;
FILE *testfile;
if ( ( testfile = fopen(filename, "r")) == NULL ) {
fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
return 1;
}
while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {
for ( i = 0; content[i] != '\0'; i++ ) {
// printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
if ( content[i] == '#' ) {
str = false;
printf("found #\n");
continue;
}
}
if (str) {
str = true;
printf("Uncommented lines in file: %s\n", content);
}
str = true;
}
// printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
fclose(testfile);
return 0;
}
I also made my test file a little simpler;
Code:
No comment
#comment 1
No comment
no comment#
As you can see, "comment 1" still gets ignored (as it should), even though it's more than 32 spaces from the start of the line, but "no comment#" still gets ignored as well (which it shouldn't).
Code:
james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment
found #
Uncommented lines in file: No comment
found #
Anyhow, I think I have fixed the "32 spaces problem" now, but my code doesn't deal with the "newline problem". And for the life of me, I cannot figure out how to stop it ignoring a string that ISN'T proceeded by a hash, for example "no comment#" should not be ignored, but on the other hand, "#a comment" SHOULD be ignored as a comment - but it still ignores any line with a hash in it, regardless of where the hash is on the line.
Anyway, this is what my code looks like now;
Code:
// a program to skip to the next line in file if the comment char (#) is encountered
#define CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void) {
char content[CONTENT_LEN];
char filename[10] = "test.txt";
bool str = true;
int i = 0;
FILE *testfile;
if ( ( testfile = fopen(filename, "r")) == NULL ) {
fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
return 1;
}
while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {
for ( i = 0; content[i] != '\0'; i++ ) {
// printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
if ( content[i] == '#' ) {
str = false;
printf("found #\n");
continue;
}
}
if (str) {
str = true;
printf("Uncommented lines in file: %s\n", content);
}
str = true;
}
// printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
fclose(testfile);
return 0;
}
I also made my test file a little simpler;
Code:
No comment
#comment 1
No comment
no comment#
As you can see, "comment 1" still gets ignored (as it should), even though it's more than 32 spaces from the start of the line, but "no comment#" still gets ignored as well (which it shouldn't).
Code:
james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment
found #
Uncommented lines in file: No comment
found #
Thanks again guys!
no, I do not think so. replace your conditional in your for loop part with this to see what it is seeing and reporting on.
this prints out a found # comment, then what it found (the #) then what is attached to the # as well.
$ ./jsbjsb001
Uncommented lines in file: No comment
comment here
#
#comment 1
Uncommented lines in file: No comment
comment here
#
no comment#
Mod>
when you say 'ignored' what do you mean by that?
1. the code ignores the #
2. it gets passed over by the code due to the # therefor ignored ,aka not acted upon, skipped in the first if statement.
slightly confusing yes.
The easiest way would be to:
for each line:
Use a for loop to check each character of that line for either a '#' or a '\n'
replace either character with a '\0', which will terminate the line at that point.
Then, you'll either have a completely empty line (one where content[0] == '\0' ) or a line with the commented part, or if there is none, the '\n' removed.
There are string functions that one could use rather than doing it manually with a for loop, but I'd suggest you ignore them for now and do it manually.
Side Note: if you don't find a '\n' at all, then your input line was too long to fit in the buffer: it's a common issue with using fgets() and is why people tend to use getline() instead. If you stick with fgets() then you might want to consider adding a check for that once you've got the rest sorted out.
astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ?
Yes, that is what is happening! And it is happening at 32 spaces precisely because the ascii value of the space character is 32.
Now before we leave this topic completely, in your mind or in code, explore more fully what you now understand...
Would this happen with other characters? Create a comment line with ten or more spaces followed by a tab character then zero or more spaces before the #. Where does the loop exit for that one?
In fact, use any character before the # at any position same as or more than its ascii value and you will get the same behavior. The reason you saw it with the space first is simply because you used the space as the leading repeated character and its ascii value is lower than other characters you used in the comments.
Look at an ascii table and find a few other convenient characters to test. That will reinforce the knowledge you have gained here as well as allow you to begin exploring the ascii table (which will be important to learn your way around too).
So, now you have understood what is so "special" about 32 or more spaces. Nothing at all! What made it special was simply your loop parameters! Next time you have a loop that seems to behave oddly under certain conditions you know where to look!
Thanks for following through on this exercise! I would also suggest you keep an annotated copy of an example of this version of your code and return to it in a few days or weeks and run it again with a few different characters, following along in your brain, to reinforce knowledge you have gained.
So the next step should be to replace this wrong behavior with something that works the way you really intended. As others have suggested, simply looking for the NULL string terminator is probably the best way, although you will need to guarantee the NULL exists due to fgets() behavior, or use a second limiting condition such as your buffer length #define, perhaps:
Code:
for ( i = 0; (content[i] != '\0') && (i < CONTENT_LEN); i++ )
Whatever you choose - verify it works as intended before moving on to other conditions within the loop.
Last edited by astrogeek; 10-11-2019 at 12:33 PM.
Reason: tpoy, topys, tyops
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Original Poster
Rep:
Thanks again guys!
I played around with "broken version" of my program (and I've saved a copy of the "broken version" like you suggested astro, so I can refer to it later on if need be), and I think I get what you mean astro. Thanks again for all of your help!
I think I've done what you said GazL. I've posted my new code below, it does work as intended as far as I can see, well, close enough anyways. Is the below code what you meant GazL?
Code:
// a program to skip to the next line in file if the comment char (#) is encountered
#define CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void) {
char content[CONTENT_LEN];
char filename[10] = "test.txt";
bool str = true;
int i = 0;
FILE *testfile;
if ( ( testfile = fopen(filename, "r")) == NULL ) {
fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
return 1;
}
while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {
for ( i = 0; content[i] != '\0'; i++ ) {
// printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
if ( content[i] == '#' ) {
// printf("found # content[] = %s\n", content);
content[i] = '\0';
continue;
if ( content[i] == '#' ) {
str = false;
// printf("found #\n");
continue;
}
}
}
if (str) {
str = true;
printf("Uncommented lines in file: %s\n", content);
}
str = true;
}
// printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
fclose(testfile);
return 0;
}
My "test file" content;
Code:
No comment
#comment 1
No comment
no comment#
And here's the output;
Code:
james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment
Uncommented lines in file:
Uncommented lines in file: No comment
Uncommented lines in file: no comment
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Original Poster
Rep:
I changed my code to your example above GazL, but either I've done it wrong, or the code isn't working. I wasn't sure where your second if statement was supposed to go; so if I have it in the for loop and I have the hash after content[0] in the "test file", and even if it's in front of a string I want to comment out, the program doesn't ignore it, and still displays the string. But if I have it outside of the for loop but within the while loop, nothing gets displayed.
Here's the code now;
Code:
// a program to skip to the next line in file if the comment char (#) is encountered
#define CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main(void) {
char content[CONTENT_LEN];
char filename[10] = "test.txt";
// bool str = true;
int i = 0;
FILE *testfile;
if ( ( testfile = fopen(filename, "r")) == NULL ) {
fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
return 1;
}
while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {
for ( i = 0; content[i] != '\0'; i++ ) {
// printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
if ( content[i] == '#' || content[i] == '\n' ) {
// printf("found # content[] = %s\n", content);
content[i] = '\0';
break;
// continue;
// if ( content[i] == '#' ) {
// str = false;
// continue;
// }
}
if ( content[i] != '\0' ) {
printf("Uncommented lines in file: %s\n", content);
}
}
// if (str) {
// str = true;
// printf("Uncommented lines in file: %s\n", content);
// }
// str = true;
}
// printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
fclose(testfile);
return 0;
}
Compare your code to mine closely, and you'll see you put the second if inside the 'for' loop. I put it outside the 'for' loop, but inside the 'while' loop. I didn't include the redundant { } around the body of the 'for' loop which may have thrown you a little.
Distribution: Currently: OpenMandriva. Previously: openSUSE, PCLinuxOS, CentOS, among others over the years.
Posts: 3,881
Original Poster
Rep:
Sorry GazL, while I thought the second if was outside of the for loop, but within the while loop. But I wasn't 100% sure because like you said, the curly braces weren't there. But it still displayed nothing even though not all of the strings in the "test file" had a hash in front of them. So I put the second if back into the while loop and changed;
Code:
if ( content[i] != '\0' )
to
Code:
if ( content[i] == '\0' )
and that got the uncommented lines to display. Sorry for the confusion GazL!
But, the only thing now is that, if I had a line like say for example "no# comment", while the "no" would be displayed; the "# comment" part wouldn't be displayed - even though both strings are on the same line. I know it's because of the '\0' terminating the line, but I can't think of how to get around that. So for example if I had a string "no# comment", it would still display the whole line provided the hash isn't at the beginning of the string, like "#no comment" for example.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.