[SOLVED] What is so special about 34 or more spaces when reading text files with C code?

phil.d.g · 10-10-2019, 11:07 PM

If you are still struggling with what astrogeek is saying, perhaps considering this variation may help:

Code:

          for ( i = 0; i < ' '; i++ ) {
              printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");
              if ( content[i] == '#' ) {
                 str = false;
                 printf("found #\n");
                 continue;
              }
          }
          //We want to see the comparison AFTER the loop has exited - in other words why it ended
          printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");

jsbjsb001 · 10-11-2019, 01:49 AM

Thanks again guys!

I agree with astrogeek in that; it's probably best to try and understand what exactly my for loop is doing, so I'll hold off on any improvements until then.

I run your example GazL, and I think I get what you mean. I think I managed to put it in a for loop, and it does print out the same output, bar the null.

Here's that code anyways;

Code:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    char string[] = { 'A', 'B', 'C', ' ', '\0' };
    
    for ( int i =0; string[i] != '\0'; ++i ) {
        printf("string[i] = %1$c = %1$d\n", string[i]);
    }

    return EXIT_SUCCESS;
}

And this is the output of that code;

Code:

james@jamespc: practice> ./ascii_example_for_loop 
string[i] = A = 65
string[i] = B = 66
string[i] = C = 67
string[i] =   = 32

astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ? If that's not what's happening, I buggered if I know. Is that at least somewhere near what's happening, and why the loop is terminating at 32?

Thanks for your help also phil.d.g, but astro's printf() output was a little easier to grasp, although from what I can see, your variation only changes the loop condition to a space, I think...

phil.d.g · 10-11-2019, 02:30 AM

Quote:

Originally Posted by jsbjsb001

Thanks for your help also phil.d.g, but astro's printf() output was a little easier to grasp, although from what I can see, your variation only changes the loop condition to a space, I think...

You are correct. I was less concerned with what it output, rather try it a little more obvious that the upper bound of the for loop was a char. Which...

Quote:

Originally Posted by jsbjsb001

astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ? If that's not what's happening, I buggered if I know. Is that at least somewhere near what's happening, and why the loop is terminating at 32?

You're starting to grasp.

Now consider that computers only know binary. We assign meaning to binary data by using types, for example char, int, bool, etc. That's only useful for us, not the computer.

As far as the computer is concerned:

' ' is 0010 0000
32 is 0010 0000

Your for loop is checking that i is less than 0010 0000.

There is a compounding part to your problem in that your upper bound for the for loop is a moving target. Think about what might happen if you had the word 'hello' starting at column 30, rather than a line full of space characters.

GazL · 10-11-2019, 05:01 AM

Quote:

Originally Posted by jsbjsb001

I run your example GazL, and I think I get what you mean. I think I managed to put it in a for loop, and it does print out the same output, bar the null.

Here's that code anyways;

Code:

#include <stdlib.h>
#include <stdio.h>

int main()
{
    char string[] = { 'A', 'B', 'C', ' ', '\0' };
    
    for ( int i =0; string[i] != '\0'; ++i ) {
        printf("string[i] = %1$c = %1$d\n", string[i]);
    }

    return EXIT_SUCCESS;
}

Bingo!

The way to iterate over a character string of variable/unknown length -- admittedly my example was a known fixed length -- is to keep going until you find a '\0' (NUL) character.

Though I created the char array 'string' char by char, when you write char string[] = "ABC "; the compiler puts a '\0' at the end for you, so it's the same thing. In your program. the fgets() puts a '\0' on the end when it turns your input characters into a string. fgets() also leaves a '\n' in there that will need to be dealt with, but we can deal with that later.

Your explanations is on the right track, though your terminology is a little off. In your program, your condition i < content[i] says compare the value i (the index into the string of the current character) to content[i] (the ascii value of the character at that index position). If content[32] is a space (ascii 32) or any character before it in the ascii table -- there aren't any printable ones -- your loop will stop. If content[65] is a 'A'(ascii 65) or any ascii character before it in the ascii table, your loop will stop -- assuming of course that it even got as far as index 65 -- and so on.

So, now you should understand why it's broken, and how to fix it.

Should you need any help, we can come back to dealing with removing comments and the unwanted '\n' left behind by fgets() once you've fixed what you already have, but see if you can work that out yourself. It'll be good practice.

jsbjsb001 · 10-11-2019, 06:21 AM

Thanks GazL! I was hoping I got that right...

Anyhow, I think I have fixed the "32 spaces problem" now, but my code doesn't deal with the "newline problem". And for the life of me, I cannot figure out how to stop it ignoring a string that ISN'T proceeded by a hash, for example "no comment#" should not be ignored, but on the other hand, "#a comment" SHOULD be ignored as a comment - but it still ignores any line with a hash in it, regardless of where the hash is on the line.

Anyway, this is what my code looks like now;

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";
    bool str = true;       
    int i = 0;
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }         
    
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
                for ( i = 0; content[i] != '\0'; i++ ) {                                  
                  //     printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");                                           
                    if ( content[i] == '#' ) {                                                       
                       str = false;
                       printf("found #\n");
                       continue;                           
                    }                   
                }
                if (str) {                    
                   str = true;   
                   printf("Uncommented lines in file: %s\n", content);           
                }
                str = true; 
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

I also made my test file a little simpler;

Code:

No comment
                                                                                  #comment 1
 No comment
 no comment#

As you can see, "comment 1" still gets ignored (as it should), even though it's more than 32 spaces from the start of the line, but "no comment#" still gets ignored as well (which it shouldn't).

Code:

james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment

found #
Uncommented lines in file:  No comment

found #

Thanks again guys!

BW-userx · 10-11-2019, 07:06 AM

Quote:

Originally Posted by jsbjsb001

Thanks GazL! I was hoping I got that right...

Anyhow, I think I have fixed the "32 spaces problem" now, but my code doesn't deal with the "newline problem". And for the life of me, I cannot figure out how to stop it ignoring a string that ISN'T proceeded by a hash, for example "no comment#" should not be ignored, but on the other hand, "#a comment" SHOULD be ignored as a comment - but it still ignores any line with a hash in it, regardless of where the hash is on the line.

Anyway, this is what my code looks like now;

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";
    bool str = true;       
    int i = 0;
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }         
    
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
                for ( i = 0; content[i] != '\0'; i++ ) {                                  
                  //     printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");                                           
                    if ( content[i] == '#' ) {                                                       
                       str = false;
                       printf("found #\n");
                       continue;                           
                    }                   
                }
                if (str) {                    
                   str = true;   
                   printf("Uncommented lines in file: %s\n", content);           
                }
                str = true; 
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

I also made my test file a little simpler;

Code:

No comment
                                                                                  #comment 1
 No comment
 no comment#

As you can see, "comment 1" still gets ignored (as it should), even though it's more than 32 spaces from the start of the line, but "no comment#" still gets ignored as well (which it shouldn't).

Code:

james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment

found #
Uncommented lines in file:  No comment

found #

Thanks again guys!

no, I do not think so. replace your conditional in your for loop part with this to see what it is seeing and reporting on.
this prints out a found # comment, then what it found (the #) then what is attached to the # as well.

Code:

      if ( content[i] == '#' ) {                                                       
         str = false;
          //printf("found #\n");
          printf("comment here\n%c\n%s\n", content[i],content);
         continue;                           
      }

results

Code:

$ ./jsbjsb001
Uncommented lines in file: No comment

comment here
#
                                                                                  #comment 1

Uncommented lines in file:  No comment

comment here
#
 no comment#

Mod>

when you say 'ignored' what do you mean by that?
1. the code ignores the #
2. it gets passed over by the code due to the # therefor ignored ,aka not acted upon, skipped in the first if statement.
slightly confusing yes.

GazL · 10-11-2019, 08:13 AM

The easiest way would be to:
for each line:
Use a for loop to check each character of that line for either a '#' or a '\n'
replace either character with a '\0', which will terminate the line at that point.

Then, you'll either have a completely empty line (one where content[0] == '\0' ) or a line with the commented part, or if there is none, the '\n' removed.

There are string functions that one could use rather than doing it manually with a for loop, but I'd suggest you ignore them for now and do it manually.

Side Note: if you don't find a '\n' at all, then your input line was too long to fit in the buffer: it's a common issue with using fgets() and is why people tend to use getline() instead. If you stick with fgets() then you might want to consider adding a check for that once you've got the rest sorted out.

jsbjsb001 · 10-11-2019, 10:29 AM

I'm sorry GazL, I'm just not clear on what you mean. Do you mean another for loop, or the existing one?

astrogeek · 10-11-2019, 11:52 AM

Quote:

Originally Posted by jsbjsb001

astrogeek, I changed my printf() to the one you've posted, and hopefully I get what you're saying now, so here goes; it's stopping at 32 because that's the first actual character (being the space) the loop reaches (because the loop condition says "less than" a ASCII char?), and as you say, it's comparing the loop condition with ASCII ?

Yes, that is what is happening! And it is happening at 32 spaces precisely because the ascii value of the space character is 32.

Now before we leave this topic completely, in your mind or in code, explore more fully what you now understand...

Would this happen with other characters? Create a comment line with ten or more spaces followed by a tab character then zero or more spaces before the #. Where does the loop exit for that one?

In fact, use any character before the # at any position same as or more than its ascii value and you will get the same behavior. The reason you saw it with the space first is simply because you used the space as the leading repeated character and its ascii value is lower than other characters you used in the comments.

Look at an ascii table and find a few other convenient characters to test. That will reinforce the knowledge you have gained here as well as allow you to begin exploring the ascii table (which will be important to learn your way around too).

So, now you have understood what is so "special" about 32 or more spaces. Nothing at all! What made it special was simply your loop parameters! Next time you have a loop that seems to behave oddly under certain conditions you know where to look!

Thanks for following through on this exercise! I would also suggest you keep an annotated copy of an example of this version of your code and return to it in a few days or weeks and run it again with a few different characters, following along in your brain, to reinforce knowledge you have gained.

So the next step should be to replace this wrong behavior with something that works the way you really intended. As others have suggested, simply looking for the NULL string terminator is probably the best way, although you will need to guarantee the NULL exists due to fgets() behavior, or use a second limiting condition such as your buffer length #define, perhaps:

Code:

for ( i = 0; (content[i] != '\0') && (i < CONTENT_LEN); i++ )

Whatever you choose - verify it works as intended before moving on to other conditions within the loop.

GazL · 10-11-2019, 12:08 PM

Quote:

Originally Posted by jsbjsb001

I'm sorry GazL, I'm just not clear on what you mean. Do you mean another for loop, or the existing one?

Modify the existing one.

What you're looking to do is turn a string like this:
{ 'A', 'B', 'C','#','D','\n','\0' }
into
{ 'A', 'B', 'C','\0','D','\0','\0' }

jsbjsb001 · 10-12-2019, 07:18 AM

Thanks again guys!

I played around with "broken version" of my program (and I've saved a copy of the "broken version" like you suggested astro, so I can refer to it later on if need be), and I think I get what you mean astro. Thanks again for all of your help!

I think I've done what you said GazL. I've posted my new code below, it does work as intended as far as I can see, well, close enough anyways. Is the below code what you meant GazL?

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";
    bool str = true;       
    int i = 0;
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }         
    
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
                for ( i = 0; content[i] != '\0'; i++ ) {                   
                    //   printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");                       
                    if ( content[i] == '#' ) {
                       //     printf("found # content[] = %s\n", content);                            
                       content[i] = '\0';                            
                       continue;
                       if ( content[i] == '#' ) {
                          str = false;
                       //  printf("found #\n");
                          continue;
                       }
                    }                                          
                }
                if (str) {                    
                   str = true;   
                   printf("Uncommented lines in file: %s\n", content);           
                }
                str = true; 
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

My "test file" content;

Code:

No comment
#comment 1
 No comment
                        no comment#

And here's the output;

Code:

james@jamespc: practice> ./skip_line_if_comment
Uncommented lines in file: No comment

Uncommented lines in file: 
Uncommented lines in file:  No comment

Uncommented lines in file:                         no comment

GazL · 10-12-2019, 08:26 AM

At its most basic, I had something like this in mind:

Code:

    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {
        for ( i = 0 ; content[i] != '\0'; i++ )            
            if ( content[i] == '#' || content[i] == '\n' ) {
                content[i] = '\0';
                break;
            }

        if ( content[0] != '\0' ) /* quick and dirty strlen(content) > 0 */
            printf("line: %s\n", content);
    }

jsbjsb001 · 10-13-2019, 04:43 AM

I changed my code to your example above GazL, but either I've done it wrong, or the code isn't working. I wasn't sure where your second if statement was supposed to go; so if I have it in the for loop and I have the hash after content[0] in the "test file", and even if it's in front of a string I want to comment out, the program doesn't ignore it, and still displays the string. But if I have it outside of the for loop but within the while loop, nothing gets displayed.

Here's the code now;

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

int main(void) {    
       
    char content[CONTENT_LEN];
    char filename[10] = "test.txt";
  //  bool str = true;       
    int i = 0;
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }         
    
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
          for ( i = 0; content[i] != '\0'; i++ ) {                   
                    //   printf("%i < %i (%c) ? %s\n", i, content[i], content[i], i<content[i] ? "True" : "False");                       
              if ( content[i] == '#'  || content[i] == '\n' ) {
                       //     printf("found # content[] = %s\n", content);                            
                 content[i] = '\0';
                 break;
                      //      continue;
                     //       if ( content[i] == '#' ) {
                      //           str = false;                                 
                               //  continue;                             
                        //    }                        
              }
              if ( content[i] != '\0' ) {
                 printf("Uncommented lines in file: %s\n", content);                     
              }
          }         
           //     if (str) {                
            //        str = true;
          //          printf("Uncommented lines in file: %s\n", content);
         //       }
            //    str = true; 
    }
 //   printf("%i < %i (%c) ? %s (loop exited)\n", i, content[i], content[i], i<content[i] ? "True" : "False");
    
    fclose(testfile);
    
    return 0;
}

GazL · 10-13-2019, 05:28 AM

Compare your code to mine closely, and you'll see you put the second if inside the 'for' loop. I put it outside the 'for' loop, but inside the 'while' loop. I didn't include the redundant { } around the body of the 'for' loop which may have thrown you a little.

jsbjsb001 · 10-13-2019, 07:22 AM

Sorry GazL, while I thought the second if was outside of the for loop, but within the while loop. But I wasn't 100% sure because like you said, the curly braces weren't there. But it still displayed nothing even though not all of the strings in the "test file" had a hash in front of them. So I put the second if back into the while loop and changed;

Code:

if ( content[i] != '\0' )

to

Code:

if ( content[i] == '\0' )

and that got the uncommented lines to display. Sorry for the confusion GazL!

But, the only thing now is that, if I had a line like say for example "no# comment", while the "no" would be displayed; the "# comment" part wouldn't be displayed - even though both strings are on the same line. I know it's because of the '\0' terminating the line, but I can't think of how to get around that. So for example if I had a string "no# comment", it would still display the whole line provided the hash isn't at the beginning of the string, like "#no comment" for example.

Is there anyway around that?