[SOLVED] What is so special about 34 or more spaces when reading text files with C code?

GazL · 10-25-2019, 08:47 AM

Code:

while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {        
    for ( i = 0; content[i] != '\0'; i++ ) {
        if ( content[i] != ' ' ) {
            if ( content[i] == '\n' ) {
                content[i] = '\0';
            }
            if ( (content[i] != '#') && (content[i] != '\0') ) {                               
                printf("line[%s]\n", content);                                            
            }
            break;
        }                  
    }                
}

... that snippet is your code without all the commented out stuff -- removing the inactive code is always a good idea as it makes it simpler to read.

Replacing it with a '\0' is the right idea, but the problem is that your for loop will stop long before it ever sees the '\n' character at the end. Examine your code, can you see why?

Once you've spotted the issue, think about a way around it. There are a number of approaches one might take. Don't let yourself be limited by the code you've already written. Change it if you need to.

rtmistler · 10-25-2019, 10:31 AM

Quote:

Originally Posted by GazL

but the problem is that your for loop will stop long before it ever sees the '\n' character at the end. Examine your code, can you see why?

I feel this is the key question. If they do not understand the problem with how they've coded this for() loop statement, then this is a blocking situation.

BW-userx · 10-25-2019, 11:07 AM

It seems to me if there is an # anywhere within the "code" that is the dead give away it is a comment because that is a reserved word # to indicate a comment in code that uses that to indicate a comment.

if OP wants to just spit out (print) the comment, find # then print that and everything after it. goto next line.

exceptions in a script, the shebang
#! if not that then goto above comment.

or perhaps in sudoers file where at the bottom line

Code:

## Read drop-in files from /usr/local/etc/sudoers.d
## (the '#' here does not indicate a comment)
#includedir /usr/local/etc/sudoers.d

where special care then too needs to be added to the code if one is planing on reading that file.

without the exceptions seek and find.

Code:

// a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {    
       
    char content[CONTENT_LEN];
    //char filename[10] = "test.txt";    
    int i = 0;    
    
    char * filename = strdup(argv[1]);
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }          
   
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {    
		 
          for ( i = 0; content[i] != '\0'; i++ ) {
	       if (content[i] == '#')
		{
		    for ( int f = i; content[f] != '\0' ; f++) {
			printf("%c", content[f]);
		     }
		}
			 				
	    }
    
    fclose(testfile);
    free(filename);

    return 0;
}

test file

Code:

no comment

#comment

						#comment
     no # here it is comment
                                   my#comment
                                no comment

output

Code:

$ ./nocomment testcommentfile
#comment
#comment
# here it is comment
#comment

anything else OP should be able to build off of that.

GazL · 10-25-2019, 11:52 AM

Quote:

// a program to skip to the next line in file if the comment char (#) is encounted

Where did you get the idea he wants to print them?

Basically what he's looking to do is this:

Code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    FILE *infile = stdin;

    char content[BUFSIZ];
    
    while ( fgets(content, BUFSIZ, infile) != NULL ) {        
        size_t offset = strspn(content, " \t");
        char *nl = strrchr(content, '\n');
        if ( nl )
            *nl = '\0';
        else {
            fprintf(stderr, "Error: line too long\n");
            exit(EXIT_FAILURE);
        }

        if ( content[offset] != '#' && content[offset] != '\0')
            printf("line[%s]\n", &content[offset]);                                            
    }
    
    return 0;
}

... but without using all the libc string functions because it's a learning exercise, and those are cheating ( plus a little bit inefficient, but we don't care about that).

BW-userx · 10-25-2019, 12:00 PM

Quote:

Originally Posted by GazL

Where did you get the idea he wants to print them?

Basically what he's looking to do is this:

Code:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main()
{
    FILE *infile = stdin;

    char content[BUFSIZ];
    
    while ( fgets(content, BUFSIZ, infile) != NULL ) {        
        size_t offset = strspn(content, " \t");
        char *nl = strrchr(content, '\n');
        if ( nl )
            *nl = '\0';
        else {
            fprintf(stderr, "Error: line too long\n");
            exit(EXIT_FAILURE);
        }

        if ( content[offset] != '#' && content[offset] != '\0')
            printf("line[%s]\n", &content[offset]);                                            
    }
    
    return 0;
}

... but without using all the libc string functions because it's a learning exercise, and those are cheating ( plus a little bit inefficient, but we don't care about that).

foul, you are using outside functions to add you in your solution.

-----

ok I did both, after I did the first one I looked back to refresh my memory as too what OP really wants, I've just been peeking in on this because of the emails and seeing that it is still going.

Then when actually read this top comment on not just looking to see it still going on, I seen it was to print the not comment parts.

So, without add of outside functions, just straight looking at each character one at a time.
five (5) pages and no solution that I seen, only because it is not marked solved yet?

So I just put my mind to it, and came up with that I posted on the other one, and now that I see it was backwards logic being applied, well lets see if this is to the OP's liking, and / or up to the OPs conditions he set for this exercise.

Code:

/
/ a program to skip to the next line in file if the comment char (#) is encountered

#define  CONTENT_LEN 373
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, char **argv) {    
       
    char content[CONTENT_LEN];
    //char filename[10] = "test.txt";    
    int i = 0;    
    
    char * filename = strdup(argv[1]);
    
    FILE *testfile; 

    if ( ( testfile = fopen(filename, "r")) == NULL ) {
         fprintf(stderr, "Input file cannot be read, aborting.\nDoes it exist?\n");
         return 1;
    }          
   printf("-------- print the comment parts ---------------------------\n");
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {    
		 
          for ( i = 0; content[i] != '\0'; i++ ) {
			  if (content[i] == '#')
			{
				for ( int f = i; content[f] != '\0' ; f++) {
					printf("%c", content[f]);
				}
			}
			 				
		}
	}
	printf("-------- print the no comment parts ---------------------------\n");
    rewind(testfile);
    int f = 0;
    while ( fgets(content, CONTENT_LEN, testfile) != NULL ) {    
	for ( i = 0; content[i] != '\0'; i++ ) {
	    if (content[i] == '#')
	    {
		for ( f = i; content[f] != '\0' ; f++) {
		     //printf("%c", content[f]);
			  ; // just run it to the end
		 }
		  i = f;
	          break;  // kick it back to the outer for loop
	     }
	     if (content[i] != '#') {
		printf("%c",content[i]);
	     }
				
	}
 //before returning to the outer for loop
// print a new line.
	printf("\n");
   }	   
    fclose(testfile);
    free(filename);
    
    return 0;
}

test file

Code:

no comment
#comment
#comment
				no # here it is comment
     my#comment
          no comment

results

Code:

$ ./nocomment testcommentfile
-------- print the comment parts ---------------------------
#comment
#comment
# here it is comment
#comment
-------- print the no comment parts ---------------------------
no comment



				no 
     my
          no comment

I just built off of the first code block.
How about that one??

rtmistler · 10-25-2019, 01:26 PM

I think there are a few fundamental problems here:

resistance to redesign
failure to recognize why a certain practice may be inadvisable

That for() line was a primary original problem.

As coded now, one could argue that technically it would work. For this specific case, not for every possible case, such as those for a binary protocol which can contain 0x00 as part of the protocol.

What I would argue now is that "as coded now", the ASM which it generates may be inefficient and non-necessary because it continually revises the loop exit criteria. Potentially the compiler would interpret that code as while(current_character is != \0). Q.E.D. The compiler likely would capture the real intentions without any fuss.

I'd also point out that the intention of that line of code is somewhat unclear. Less readable and therefore difficult to maintain.

But whatever. Been in software for years, reviewed other's code for years, as well as the same in reverse. When a person is dead set about following a specific practice no matter what others say to them, and solely wishes to base their argument on "Well, it works ...", then you're never going to change their mind unless you have the power to override them.

BW-userx · 10-25-2019, 01:41 PM

Quote:

Originally Posted by rtmistler

I think there are a few fundamental problems here:

resistance to redesign
failure to recognize why a certain practice may be inadvisable

That for() line was a primary original problem.

As coded now, one could argue that technically it would work. For this specific case, not for every possible case, such as those for a binary protocol which can contain 0x00 as part of the protocol.

What I would argue now is that "as coded now", the ASM which it generates may be inefficient and non-necessary because it continually revises the loop exit criteria. Potentially the compiler would interpret that code as while(current_character is != \0). Q.E.D. The compiler likely would capture the real intentions without any fuss.

I'd also point out that the intention of that line of code is somewhat unclear. Less readable and therefore difficult to maintain.

But whatever. Been in software for years, reviewed other's code for years, as well as the same in reverse. When a person is dead set about following a specific practice no matter what others say to them, and solely wishes to base their argument on "Well, it works ...", then you're never going to change their mind unless you have the power to override them.

I completely agree, and I am not at the moment feeling like figuring all of "what ifs" out for someone else's self made exercise.

I basically got tired of seeing it popping up in my email, and saying to myself, this has not been figured out yet?

I just took what was there and ran with it. Using that little run until it hits the '\0'.

If the OP wants to expand on that, let him/her. It is now open source, and no restrictions are set on it for anyone wanting to do anything to it whatsoever. I hold no rights to it whatsoever in any form whatsoever.

GazL · 10-25-2019, 02:13 PM

Nev's post #33 really should have ended the thread. It was a much cleaner solution I tried to bring attention to it a while back when I noticed it getting ignored, but it was still ignored.

BW-userx · 10-25-2019, 02:21 PM

yes using pointer math

Code:

ispace (*p)

I remember I was told no outside functions are to be used. Without putting much thought into it other than that logic I posted on it about # being a reserved word for comments, I just wrote it to print them and not print them, print the other instead. Line of though using what was there.

Now rtmistler is tossing another fish in the barrel to try and catch.... OP hello?

rnturn · 10-25-2019, 03:14 PM

Quote:

Originally Posted by BW-userx

I was going to comment on this is over kill but held my tongue. plus what usage is this for, a C program to find # as a comment which as far as I know mostly if not only used in bash?

Also: Perl, Python, probably others.

phil.d.g · 10-25-2019, 03:31 PM

Quote:

Originally Posted by rtmistler

As coded now, one could argue that technically it would work. For this specific case, not for every possible case, such as those for a binary protocol which can contain 0x00 as part of the protocol.

But it isn't. And, if it was, then:

I would be raising an eyebrow at the thought of a file containing binary data being annotated with human readable comments.
I would also be asking for a spec for the binary data format before even thinking about starting to code.

Quote:

Originally Posted by GazL

Nev's post #33 really should have ended the thread. It was a much cleaner solution I tried to bring attention to it a while back when I noticed it getting ignored, but it was still ignored.

There are bugs with it.

It's isspace, not ispace.
If the full line doesn't fit in the buffer the code is going treat a single line as multiple ones. I acknowledge I've led jsbjsb001 to a solution that suffers the same problem, but I made that clear at the start.

That's OK, bugs are allowed, but let's stop putting broken code on a pedestal. I don't disagree that, when fixed, it's going to be a cleaner solution. However, how is that making jsbjsb001 understand what problems he had in his own code?

With respect to the immediate problem at hand:

jsbjsb001's code prints out two lines for every non-comment line it reads from the input file. The second line is blank. Consider:

Code:

printf("%s\n", content);

The documentation for fgets() says:

Quote:

Reading stops after an EOF or a newline. If a new‐line is read, it is stored into the buffer.

Why do we need to add our own?

rtmistler · 10-25-2019, 03:38 PM

The point is that the original code design is poor.

One cannot just continue to ad hoc more and more complicated code.

Sure the OP can do that all they wish, but they're going to hear from us when their designs are poorly made.

It's not just about getting syntax correct.

GazL · 10-25-2019, 07:05 PM

Quote:

Originally Posted by phil.d.g

Why do we need to add our own?

Normally, if you read a string in with fgets() you're going to use it for something, and you typically want to strip the trailing \n from it before you do. While James is currently just printing the string out, he could very well decide to do something else with the strings later on down the line, so IMO it's better to remove the '\n' from the string as a matter of course and include it in the printf format string rather than the other way around.

BTW, IMO a typo does not deserve the term 'broken code'. I usually reserve that for code that does the wrong thing, or has some weakness that can blow up in your face.

Also, the incomplete line thing you bring up with fgets() has been mentioned a number of times in this thread. I usually avoid the issue by using getline(3) instead.

phil.d.g · 10-25-2019, 07:46 PM

Quote:

Originally Posted by GazL

Also, the incomplete line thing you bring up with fgets() has been mentioned a number of times in this thread. I usually avoid the issue by using getline(3) instead.

Fair.

jsbjsb001 · 10-26-2019, 04:37 AM

Quote:

Originally Posted by GazL

[snipped]
...
Replacing it with a '\0' is the right idea, but the problem is that your for loop will stop long before it ever sees the '\n' character at the end. Examine your code, can you see why?

Once you've spotted the issue, think about a way around it. There are a number of approaches one might take. Don't let yourself be limited by the code you've already written. Change it if you need to.

No, I tried the printf() astrogeek gave me before, I've tried looking at the code, but I really don't know why it'll stop before it gets to the newline character.

Quote:

Originally Posted by rtmistler

I think there are a few fundamental problems here:

resistance to redesign
failure to recognize why a certain practice may be inadvisable

That for() line was a primary original problem.
...

Well RT, this is exactly what I've been trying to say; you seem to think that because I'm trying to take things one step at a time, rather than redesigning the code how you would, that I'm not interested in "learning the art of C". This is simply not true. You can't reasonably expect someone who doesn't yet fully understand the concepts to be able to design one of the best coded programs you've ever seen. You can't possibly expect that same person to understand why a certain practice maybe "inadvisable" if they don't understand the problem to begin with.

Right, wrong or indifferently, I get the impression you seem to think I can solve a problem without understanding what the problem with my code was in the first place. And/or I should be able to just use gdb to find out, almost as if it is going to say something like "your loop condition is totally wrong, and you need to change it to XYZ, if you want the program to do X, Y or Z". Well, I wish did tell me that, I really do, but it doesn't. I'm not trying to be disrespectful to you or anything, but I really do think that you either think I understand more than what I really do, and/or I should "just understand" if I have read a book about it, or something not too dissimilar. I can't change the way I learn, even if you have a different way of learning yourself.

Quote:

Originally Posted by BW-userx

...
Now rtmistler is tossing another fish in the barrel to try and catch.... OP hello?

Well hello BW... this is exactly what I'm talking about above, it's hard enough just trying to understand one problem, let alone when, and to use your words, "someone throws yet another fish in the barrel". In other words; how on earth am I supposed to understand the example given in post #33 if I can't understand why what I'm trying to do without pointers isn't working the way I've intended it to ? Among the fact I don't fully understand how that code works, this is why I'm trying to take it one step at a time. That way, I can SEE the different versions of the program, and hopefully be able to get a better understanding of WHY the code in post #33 is "so much better".

Quote:

Originally Posted by rnturn

Also: Perl, Python, probably others.

Can I ask that we just focus on the problem at hand, and keep the discussion concerning the C programming language - things are hard enough as it is, thanks. And also, I'm not currently trying to learn Perl or Python, nor trying to write a Perl or Python program.