gibberish from putchar/fprintf

neelpatel · 02-17-2010, 04:29 AM

I'm asked to modify the code for a cat program below so that it can address whitespaces and avoid segfaulting if presented with line lengths greater than the buffersize. The program as it is written below produces correct output (albeit without whitespace). However, when I modify it by changing the formatting of the fprintf call from %s to %c,
the program outputs:
�ֿ,��̷ֿz �ֿ,��̷ֿz �ֿ,��̷ֿz

This occurs on my two home Ubuntu builds. On the version of Debian that I am supposed to get this to run on, output is correct. An interesting note is that removing the fprintf(".....%s....",....argv[i]) in an if-branch that is never traversed, somehow fixes the problem on my ubuntu builds. Any thoughts?

Code:

#include <stdio.h>
#include <stdlib.h>

#define SUCCESS  0
#define E_PARAM  1

int main(int argc, char **argv)
{
    int i, numread;
    FILE *in;
    char buf[100];
    if(argc==0)
    {
        fprintf(stderr, "ERROR: Not enough parameters.\n");
        fprintf(stderr, "Syntax: %s [file1] [file2] ... [fileN]\n", argv[0]);
        exit(E_PARAM);
    }
    for(i=1;i<argc;i++)
    {
        in = fopen(argv[i], "rt");
        if(in==NULL)
            fprintf(stderr, "\n%s: %s: No such file or directory\n", argv[0], argv[i]); 

/* Comment the line above and instead use to make it work:
            fprintf(stderr, "\n%s: Invalid file or directory\n", argv[0]);
*/

        else while(!feof(in))
        {
            numread=fscanf(in, "%s", buf); // CHANGE %s --> %c to BREAK
            if(numread>0 && numread != EOF)
                    fprintf(stdout, buf);
        }
    }
    exit(SUCCESS);
}

Worse still, my program above (tweaked as explained above) to run on my ubuntu builds produces gibberish when I try to run it on debian.

I have attached the second part of this project (an attempt at optimizing the program by using lower level I/O functions). This runs fine on the Ubuntu builds, but produces gibberish on the Debian build that I will be evaluated on. I also find that my processing times are incredibly variable and generally higher than that of the high-level scanf/printf variation. Is this due to the fact that my test machine is virtualized? Is there an optimal buffer size or a way to avoid looping over the buffer to zero the values or to be more efficient in general?

Code:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>

#define SUCCESS  0
#define ERR  1
#define BUFSIZE 32

int main(int argc, char **argv)
{
    int i,c;
    char buf[BUFSIZE] = "";

    if(argc==1)
    {
        fprintf(stderr, "ERROR: Not enough parameters.\n");
        fprintf(stderr, "Syntax: %s [file1] [file2] ... [fileN]\n", argv[0]);
        exit(ERR);
    }

    for(i=1;i<argc;i++)
    {
        int filedes = open(argv[i],O_RDONLY,0);
        if (filedes == -1) {
            fprintf(stderr, "\n%s: %s: No such file or directory\n", argv[0], argv[i]);
	    exit(ERR);
	}  
	struct stat statbuf;

	fstat(filedes,&statbuf);

	size_t offset = 0;	

	while (1) {
		int x = pread(filedes, buf, sizeof(buf), offset);
		if ( x == 0 ) break; 
		offset += x;
		x = write(1,buf,sizeof(buf));
		for (c = 0 ; c < BUFSIZE ; c++) {
			buf[c] = 0;
		}
	}
	close(filedes);
    }
    exit(SUCCESS);
}

Thanks in advance,
Neel

**EDIT**
The second program also results in gibberish when trying to use putchar instead of write. Why does this depend on linux build? I've tried running the binaries on both and compiling them natively. Either way, I'm in programming babel.

Sergei Steshenko · 02-17-2010, 04:44 AM

Quote:

Originally Posted by neelpatel

I'm asked to modify the code for a cat program
...

By whom/what for ? Are you sure standard Linux 'cat' can't cope with characters you think it needs to cope with ?

Here is a screen session on my SUSE 11.1 box:

Code:

sergei@amdam2:~/junk> echo "Привет, neelpatel" | cat -n
     1  Привет, neelpatel
sergei@amdam2:~/junk>

- the "Привет" word is in Russian and can be translated as "Hello" in this case.

irmin · 02-17-2010, 04:47 AM

To your first program:

Code:

   if(argc==0)

argc is never zero. If no arguments are given, argc is 1. For the error handling in the case, than fopen fails, see below. Futhermore the streams are not closed. Use fclose(3) for this.

Code:

       else while(!feof(in))
        {
            numread=fscanf(in, "%s", buf); // CHANGE %s --> %c to BREAK
            if(numread>0 && numread != EOF)
                    fprintf(stdout, buf);
        }

This code is strage by itself. fscanf(in,"%s",buf) is very dangerous, because you cannot know how long a word will be! Changing to %c will only read one non-whitespace character. The result will be written to buf[0]. But in this case the string will not be terminated with 0, so fprintf will not know where it ends and therefore will output everything in memory, until it encounters a 0. That is the glibberish you observe.

To your second program:

Code:

x = write(1,buf,sizeof(buf));

This will write sizeof(buf) bytes to stdout. But you cannot be sure that there are so many bytes inside buffer. pread will return the number of bytes read. You should use that return value as the size argument to write. Otherwise some garbage, that is behind the official read bytes is written to stdout too, which is the glibberish you observe.

Furthermore there is no need to call pread. Just use read on the file, because you do not need to seek explicitly (read will do automatically). Also take care for the case, that read/pread will fail! (x<0)

Code:

	for (c = 0 ; c < BUFSIZE ; c++) {
			buf[c] = 0;
		}

These lines are totally unnecessary.

Code:

     if (filedes == -1) {
            fprintf(stderr, "\n%s: %s: No such file or directory\n", argv[0], argv[i]);
	    exit(ERR);
	}

If open fails, why do you assume that the file does not exist? Better use perror(3) or strerror(3) to get the real reason for the failure.

irmin · 02-17-2010, 04:51 AM

Quote:

Originally Posted by Sergei Steshenko

By whom/what for ?

I think I know who asked him for:

Quote:

08048000-08049000 r-xp 00000000 08:01 153265 /home/neel/cs/lab05/exercise1/a.out
08049000-0804a000 r--p 00000000 08:01 153265 /home/neel/cs/lab05/exercise1/a.out
0804a000-0804b000 rw-p 00001000 08:01 153265 /home/neel/cs/lab05/exercise1/a.out
0

taken from https://www.linuxquestions.org/quest...y-dump-789651/.

So I think that this are some computer science laboratory exercises ...
But at least he tried himself ...

Sergei Steshenko · 02-17-2010, 04:56 AM

Quote:

Originally Posted by irmin

I think I know who asked him for:

taken from https://www.linuxquestions.org/quest...y-dump-789651/.

So I think that this are some computer science laboratory exercises ...
But at least he tried himself ...

I have no problem with the question being a homework; still, source code of 'cat' is freely available and one can even choose GNU or *BSD flavor

.

neelpatel · 02-17-2010, 05:00 AM

Quote:

Originally Posted by irmin

To your first program:

Code:

       else while(!feof(in))
        {
            numread=fscanf(in, "%s", buf); // CHANGE %s --> %c to BREAK
            if(numread>0 && numread != EOF)
                    fprintf(stdout, buf);
        }

This code is strage by itself. fscanf(in,"%s",buf) is very dangerous, because you cannot know how long a word will be! Changing to %c will only read one non-whitespace character. The result will be written to buf[0]. But in this case the string will not be terminated with 0, so fprintf will not know where it ends and therefore will output everything in memory, until it encounters a 0. That is the glibberish you observe.

The first program that I am asked to improve. It runs correctly on my instructors/university computers. Moreover, it runs correctly on mine if I remove the error messages in the previous if loop. I tried printf-ing only buf[0], but it threw a segfault.

Quote:

Originally Posted by irmin

To your second program:

Code:

x = write(1,buf,sizeof(buf));

This will write sizeof(buf) bytes to stdout. But you cannot be sure that there are so many bytes inside buffer. pread will return the number of bytes read. You should use that return value as the size argument to write. Otherwise some garbage, that is behind the official read bytes is written to stdout too, which is the glibberish you observe.

Furthermore there is no need to call pread. Just use read on the file, because you do not need to seek explicitly (read will do automatically). Also take care for the case, that read/pread will fail! (x<0)

Code:

	for (c = 0 ; c < BUFSIZE ; c++) {
			buf[c] = 0;
		}

These lines are totally unnecessary.

This code runs correctly on my computers (whereas the first program didn't if I modified %s to %c). The unnecessary for loop is used to zero out the buffer, so that if it is not filled by pread, then it does not print unnecessary garbage. As such, it outputs correctly on my computers.

However, each program outputs only gibberish when run on the other set of computers (mine versus my universities). The only consistent difference between the two is that mine are running Ubuntu and the school's are running Debian.

The most inexplicable phenomenon is that when I remove "%s",argv[i] from the "No such file or directory" error, the first program runs as expected on my personal computers. It runs regardless on the Debians.

The second program as written runs correctly on the Ubuntus. But outputs complete gibberish on the Debians.

In all cases, gibberish implies no characters other than the odd diamonds (which octal dumps claims are EOT -- end of transmission markers) are outputted.

Thank you for your quick reply and numerous corrections to my code. The one I did not mention make sense to me and I will make appropriate changes.

neelpatel · 02-17-2010, 05:06 AM

Sorry for not specifying. This is a lab for my computer science course. The task was to find errors in the first program (explain why it can't deal with white space and why it segfaults), then modify the program to do these things (replacing %s with %c works in the CS laboratory), finally write a lower-level implementation to be more efficient than the original. Cat does work for my purposes, but the exercise is to give us a better understanding of C.

Some more testing revealed that a buffer size of 16 or 32 characters works best. I'm sure that number will be significantly higher once I remove the for-loop that zeros out the array. But at the moment, if I don't empty the array the last buffer is doubled in the output. I will try to replace pread by read as soon as I can catch a bit of sleep.

irmin · 02-17-2010, 05:08 AM

Quote:

The unnecessary for loop is used to zero out the buffer, so that if it is not filled by pread, then it does not print unnecessary garbage. As such, it outputs correctly on my computers.

So you think, that zeroing out the buffer, will not print them on the screen? But still these zeros will be written to the terminal. Under Ubuntu the terminal driver seems to ignore the zeros, but under Debian it interprets them as EOT. I think that the problem is solved by changing the number of bytes you want to write to the number of bytes actually read before.

Sergei Steshenko · 02-17-2010, 05:12 AM

Quote:

Originally Posted by neelpatel

Sorry for not specifying. This is a lab for my computer science course. The task was to find errors
...

Have you tried to compile the program with 'gcc' using

Code:

-Wall -Wextra

command line switches ? 'gcc' often catches errors statically, especially in *printf functions.

neelpatel · 02-17-2010, 05:22 AM

Quote:

Originally Posted by irmin

So you think, that zeroing out the buffer, will not print them on the screen? But still these zeros will be written to the terminal. Under Ubuntu the terminal driver seems to ignore the zeros, but under Debian it interprets them as EOT. I think that the problem is solved by changing the number of bytes you want to write to the number of bytes actually read before.

That makes sense and is easy enough to test. Since zeroing it out worked on ubuntu, I assumed that write/printf/etc interpreted 0 as /0. I'll do as you suggested.

Compiling in g++ works on both systems.
The -Wall -Wextra flags did not work on the Ubuntu system for getting Program #1 to compile.

Somehow recompiling got the programs to work on the remote box (debian). I made no changes to the programs, they just work. I'm not going to question it, maybe I did something stupid while tired.

So the final question stand:

why does commenting out the printf(stderr,"%s",argv[i]) allow me to fprintf(stdout,"%c",buff) in Ubuntu, but leaving it in (even though it is in an irrelevant for-loop (that exits afterwards), causes fprintf to produce gibberish.

Again, thanks for the quick replies. Somehow with no real changes my lab is in acceptable state to be turned in. Your suggestions were very instructional. I'm just really puzzled as to why removing that argv[i] string is so destructive on both my ubuntu builds.

Neel

irmin · 02-17-2010, 05:39 AM

Quote:

That makes sense and is easy enough to test. Since zeroing it out worked on ubuntu, I assumed that write/printf/etc interpreted 0 as /0. I'll do as you suggested.

If write treated 0 as void, then you could never write correct binary data to a file.

Quote:

why does commenting out the printf(stderr,"%s",argv[i]) allow me to fprintf(stdout,"%c",buff) in Ubuntu, but leaving it in (even though it is in an irrelevant for-loop (that exits afterwards), causes fprintf to produce gibberish.

If you mean fprintf(stderr,"%s",argv[i]) instead of printf(stderr,"%s",argv[i]), then it should have no side effects.

I cannot find a variable named buff in your source code. But if you mean buf, then you call fprintf(stdout,"%c",buff) the wrong way. The correct way would be:
fprintf(stdout,"%c",*buf) or fprintf(stdout,"%c",buf[0]). Otherwise the first byte of the address of buf will be written, which will be garbage.

Sergei Steshenko · 02-17-2010, 11:20 AM

Quote:

Originally Posted by neelpatel

...
The -Wall -Wextra flags did not work on the Ubuntu system for getting Program #1 to compile.
...

What do you mean by that ? What were the compiler messages ?