HTML source obtained through sockets in C showing unreadable characters
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
HTML source obtained through sockets in C showing unreadable characters
I wrote the following code to retreive HTML source into a file and the terminal for viewing. But output on the terminal and file are showing few unreadable characters.
int main(int argc, char **argv)
{
struct addrinfo hints;
struct addrinfo *results;
int ret, sockfd;
char buffer[512], resource[512];
FILE *outfile;
if (argc != 3) {
fprintf(stderr, "Not enough arguments to go forward!!\n");
return 1;
}
But even when I am directly writing into a file and opening it with gedit, it still shows these unreadable characters. Also, I tried copying the HTML source directly to the terminal (to check if these characters are supported), and they displayed fine. The problem seems to appear only when i use sockets to get the source data.
Please consider editing your first post to place your code within [code][/code] tags. There's a link in my signature which shows information on how to do that if you aren't sure. It makes the code more readable and helps people to assist you.
As far as what these characters are, one thing I'd do is to compile with debugging on, enter the debugger, set a breakpoint at an appropriate place and then examine the characters, because you can see the hex or binary data as it is saved in memory, and then you can use something like this Ascii Table Reference to determine what characters are and whether or not they're control characters, or something else.
Assuming you're using C and GCC here's a very brief example of my point:
What this code will do is start with an uninitialized array, clear it completely, then copy the first static string "01234567" into the array, print that, then copy the next static string which is a few carriage returns and line feeds, into the array and print that, then exit. I compile it using the -ggdb flag and then enter gdb to debug and examine it. You can use these same commands/techniques to debug your own program.
Code:
$ gcc -ggdb -o sample sample.c
$ gdb sample
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/testcode/sample...done.
(gdb) b sample.c:13
Breakpoint 1 at 0x8048461: file sample.c, line 13.
(gdb) r
Starting program: /home/testcode/sample
Breakpoint 1, main (argc=1, argv=0xbffff3a4) at sample.c:13
13 memcpy(my_array, data_1, sizeof(my_array));
(gdb) p my_array
$1 = "\000\000\000\000\000\000\000"
(gdb) x/8b my_array
0xbffff2f4: 0 0 0 0 0 0 0 0
(gdb) s
15 printf("First iteration: %s\n", my_array);
(gdb) s
First iteration: 01234567
17 memcpy(my_array, data_2, sizeof(my_array));
(gdb) p my_array
$2 = "01234567"
(gdb) x/8b my_array
0xbffff2f4: 48 49 50 51 52 53 54 55
(gdb) s
19 printf("Second iteration: %s\n", my_array);
(gdb) p my_array
$3 = "\r\n\r\n\r\n\r\n"
(gdb) x/8b my_array
0xbffff2f4: 13 10 13 10 13 10 13 10
(gdb) s
Second iteration:
21 return -1;
(gdb)
22 }
(gdb) quit
So you can see that for what I call the second iteration, there are non-visible, but printing characters; however there is data when you examine memory. Similarly, when that array was memset to all zeros, the string would be NULL and therefore printing it would show nothing. Or if data started with a 0x00 but continued further, printing the string would still result in showing nothing because of a NULL terminator at the start of a string.
Another technique if you don't like the debugger or are running a system with a variety of co-dependent programs and inline debugging is difficult is to have a logger or console output and anticipate that the array of data may not all be visible/printable, so instead have a log/output utility which converts everything to HEX-ASCII. For instance "A" is 0x41, but it's also capital A, so what's the big deal, but carriage return is 0x0d and you don't "see" that, you see a newline or worse you see printing return to the start of a line and then start obliterating stuff already output. But if you just say "for zero to N-1, print each character in hex" then you'll see what the entire buffer contains and can debug what you're seeing from this file.
doesn't make any sense. You keep looping, writing to your buffer until the connection is closed. There could be anything in that buffer so fix that first.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.