HTML source obtained through sockets in C showing unreadable characters
I wrote the following code to retreive HTML source into a file and the terminal for viewing. But output on the terminal and file are showing few unreadable characters.
int main(int argc, char **argv) { struct addrinfo hints; struct addrinfo *results; int ret, sockfd; char buffer[512], resource[512]; FILE *outfile; if (argc != 3) { fprintf(stderr, "Not enough arguments to go forward!!\n"); return 1; } hints.ai_family = AF_UNSPEC; hints.ai_socktype = SOCK_STREAM; hints.ai_flags = AI_PASSIVE; if ((ret = getaddrinfo(argv[1], argv[2], &hints, &results)) != 0) { fprintf(stderr, "getaddrinfo() error\n"); return 2; } if ((sockfd = socket(results->ai_family, results->ai_socktype, results->ai_protocol)) < 0) { fprintf(stderr, "Socket not made\n"); return 4; } if(connect(sockfd, results->ai_addr, results->ai_addrlen) < 0) { fprintf(stderr, "Connection not established\n"); return 5; } printf("The News Feed URL : "); scanf("%s", resource); sprintf(buffer, "GET %s HTTP/1.1\nHost:%s\n\n", resource, argv[1]); printf("%s", buffer); if ((ret = write(sockfd, buffer, strlen(buffer))) < 0) { fprintf(stderr, "Write Failed\n"); return 3; } printf("Request Sent\n"); outfile = fopen("rssout.txt", "w"); while(1) { if((ret = read(sockfd, buffer, sizeof(buffer))) <= 0) { printf("Read Error OR Connection Closed\n"); break; } fprintf(outfile, "%s", buffer); } fclose(outfile); close(sockfd); printf("\n\nALL IS WELL THAT ENDS WELL\n\n"); freeaddrinfo(results); rssfeed(); return 0; } eg : </authhá{¹or> What is the problem ? How do I correct it? |
The characters are probably Unicode (multi-byte characters), and maybe your terminal-window settings are not set to display those characters properly.
See for example http://earthwithsun.com/questions/55...rtual-terminal ... |
But even when I am directly writing into a file and opening it with gedit, it still shows these unreadable characters. Also, I tried copying the HTML source directly to the terminal (to check if these characters are supported), and they displayed fine. The problem seems to appear only when i use sockets to get the source data.
|
Please consider editing your first post to place your code within [code][/code] tags. There's a link in my signature which shows information on how to do that if you aren't sure. It makes the code more readable and helps people to assist you.
As far as what these characters are, one thing I'd do is to compile with debugging on, enter the debugger, set a breakpoint at an appropriate place and then examine the characters, because you can see the hex or binary data as it is saved in memory, and then you can use something like this Ascii Table Reference to determine what characters are and whether or not they're control characters, or something else. Assuming you're using C and GCC here's a very brief example of my point: A general sample program Code:
#include <stdio.h> Code:
$ gcc -ggdb -o sample sample.c Another technique if you don't like the debugger or are running a system with a variety of co-dependent programs and inline debugging is difficult is to have a logger or console output and anticipate that the array of data may not all be visible/printable, so instead have a log/output utility which converts everything to HEX-ASCII. For instance "A" is 0x41, but it's also capital A, so what's the big deal, but carriage return is 0x0d and you don't "see" that, you see a newline or worse you see printing return to the start of a line and then start obliterating stuff already output. But if you just say "for zero to N-1, print each character in hex" then you'll see what the entire buffer contains and can debug what you're seeing from this file. |
Just download the same file with wget, and compare the two files.
|
This:
Code:
while(1) { Off the top of my head: Code:
|
All times are GMT -5. The time now is 10:23 AM. |