LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-22-2015, 07:16 AM   #1
TheChronicScribbler
Member
 
Registered: Nov 2013
Posts: 31

Rep: Reputation: Disabled
HTML source obtained through sockets in C showing unreadable characters


I wrote the following code to retreive HTML source into a file and the terminal for viewing. But output on the terminal and file are showing few unreadable characters.



int main(int argc, char **argv)
{
struct addrinfo hints;
struct addrinfo *results;
int ret, sockfd;
char buffer[512], resource[512];
FILE *outfile;

if (argc != 3) {
fprintf(stderr, "Not enough arguments to go forward!!\n");
return 1;
}


hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE;


if ((ret = getaddrinfo(argv[1], argv[2], &hints, &results)) != 0) {
fprintf(stderr, "getaddrinfo() error\n");
return 2;
}

if ((sockfd = socket(results->ai_family, results->ai_socktype, results->ai_protocol)) < 0) {
fprintf(stderr, "Socket not made\n");
return 4;
}

if(connect(sockfd, results->ai_addr, results->ai_addrlen) < 0) {
fprintf(stderr, "Connection not established\n");
return 5;
}

printf("The News Feed URL : ");
scanf("%s", resource);
sprintf(buffer, "GET %s HTTP/1.1\nHost:%s\n\n", resource, argv[1]);

printf("%s", buffer);

if ((ret = write(sockfd, buffer, strlen(buffer))) < 0) {
fprintf(stderr, "Write Failed\n");
return 3;
}

printf("Request Sent\n");

outfile = fopen("rssout.txt", "w");

while(1) {
if((ret = read(sockfd, buffer, sizeof(buffer))) <= 0) {
printf("Read Error OR Connection Closed\n");
break;
}

fprintf(outfile, "%s", buffer);

}

fclose(outfile);
close(sockfd);
printf("\n\nALL IS WELL THAT ENDS WELL\n\n");
freeaddrinfo(results);


rssfeed();

return 0;
}


eg : </authhá{¹or>

What is the problem ? How do I correct it?
 
Old 01-22-2015, 07:33 AM   #2
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,659
Blog Entries: 4

Rep: Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941Reputation: 3941
The characters are probably Unicode (multi-byte characters), and maybe your terminal-window settings are not set to display those characters properly.

See for example http://earthwithsun.com/questions/55...rtual-terminal ...
 
Old 01-22-2015, 07:45 AM   #3
TheChronicScribbler
Member
 
Registered: Nov 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
But even when I am directly writing into a file and opening it with gedit, it still shows these unreadable characters. Also, I tried copying the HTML source directly to the terminal (to check if these characters are supported), and they displayed fine. The problem seems to appear only when i use sockets to get the source data.
 
Old 01-22-2015, 08:11 AM   #4
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Please consider editing your first post to place your code within [code][/code] tags. There's a link in my signature which shows information on how to do that if you aren't sure. It makes the code more readable and helps people to assist you.

As far as what these characters are, one thing I'd do is to compile with debugging on, enter the debugger, set a breakpoint at an appropriate place and then examine the characters, because you can see the hex or binary data as it is saved in memory, and then you can use something like this Ascii Table Reference to determine what characters are and whether or not they're control characters, or something else.

Assuming you're using C and GCC here's a very brief example of my point:

A general sample program
Code:
#include <stdio.h>
#include <string.h>

static char data_1[8] = "01234567";
static char data_2[8] = "\r\n\r\n\r\n\r\n";

int main(int argc, char **argv)
{
    char my_array[8];

    memset(my_array, 0, sizeof(my_array));

    memcpy(my_array, data_1, sizeof(my_array));

    printf("First iteration: %s\n", my_array);

    memcpy(my_array, data_2, sizeof(my_array));

    printf("Second iteration: %s\n", my_array);

    return -1;
}
What this code will do is start with an uninitialized array, clear it completely, then copy the first static string "01234567" into the array, print that, then copy the next static string which is a few carriage returns and line feeds, into the array and print that, then exit. I compile it using the -ggdb flag and then enter gdb to debug and examine it. You can use these same commands/techniques to debug your own program.
Code:
$ gcc -ggdb -o sample sample.c
$ gdb sample
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/testcode/sample...done.
(gdb) b sample.c:13
Breakpoint 1 at 0x8048461: file sample.c, line 13.
(gdb) r
Starting program: /home/testcode/sample 

Breakpoint 1, main (argc=1, argv=0xbffff3a4) at sample.c:13
13	    memcpy(my_array, data_1, sizeof(my_array));
(gdb) p my_array
$1 = "\000\000\000\000\000\000\000"
(gdb) x/8b my_array
0xbffff2f4:	0	0	0	0	0	0	0	0
(gdb) s
15	    printf("First iteration: %s\n", my_array);
(gdb) s
First iteration: 01234567
17	    memcpy(my_array, data_2, sizeof(my_array));
(gdb) p my_array
$2 = "01234567"
(gdb) x/8b my_array
0xbffff2f4:	48	49	50	51	52	53	54	55
(gdb) s
19	    printf("Second iteration: %s\n", my_array);
(gdb) p my_array
$3 = "\r\n\r\n\r\n\r\n"
(gdb) x/8b my_array
0xbffff2f4:	13	10	13	10	13	10	13	10
(gdb) s
Second iteration: 




21	    return -1;
(gdb) 
22	}
(gdb) quit
So you can see that for what I call the second iteration, there are non-visible, but printing characters; however there is data when you examine memory. Similarly, when that array was memset to all zeros, the string would be NULL and therefore printing it would show nothing. Or if data started with a 0x00 but continued further, printing the string would still result in showing nothing because of a NULL terminator at the start of a string.

Another technique if you don't like the debugger or are running a system with a variety of co-dependent programs and inline debugging is difficult is to have a logger or console output and anticipate that the array of data may not all be visible/printable, so instead have a log/output utility which converts everything to HEX-ASCII. For instance "A" is 0x41, but it's also capital A, so what's the big deal, but carriage return is 0x0d and you don't "see" that, you see a newline or worse you see printing return to the start of a line and then start obliterating stuff already output. But if you just say "for zero to N-1, print each character in hex" then you'll see what the entire buffer contains and can debug what you're seeing from this file.
 
Old 01-22-2015, 09:08 AM   #5
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,863
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Just download the same file with wget, and compare the two files.
 
Old 01-22-2015, 09:45 AM   #6
SoftSprocket
Member
 
Registered: Nov 2014
Posts: 399

Rep: Reputation: Disabled
This:
Code:
while(1) {
if((ret = read(sockfd, buffer, sizeof(buffer))) <= 0) {
printf("Read Error OR Connection Closed\n");
break;	
}
doesn't make any sense. You keep looping, writing to your buffer until the connection is closed. There could be anything in that buffer so fix that first.

Off the top of my head:
Code:
char buffer[512];


size_t num_read = 0;
size_t num_to_read = sizeof buffer;
char* pbuf = buffer;

while ((num_read = read (sockfd, pbuf, num_to_read)) < num_to_read) {
    if (num_read == 0) {
         printf ("EOF\n");
         break;
    }

    if (num_read < 0) {
         if (errno == EINTR) {
             continue;
         }

         perror ("read");
         // error handling here
    }
 
    num_to_read -= num_read;
    pbuf += num_read;
}
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Special characters can't be obtained since PL431... Benny7440 Puppy 6 01-26-2012 11:58 AM
Unreadable Characters at Console Screen after KDE Logout x360539 Slackware 4 01-17-2011 02:52 PM
[SOLVED] Unreadable characters in error messages and man page output gangadher Linux - Newbie 9 08-24-2010 07:58 AM
[SOLVED] Vim not showing some characters kofucii Linux - Newbie 2 09-07-2009 09:09 AM
Apache install page has unreadable characters nuzzy Linux - Networking 0 09-19-2001 03:25 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration