ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75
Rep:
C sockets loosing data?
Hey all.
I want to read data from a website accessd through a proxy and send it to another socket. My problem is that I seem to be loosing a few bytes of data.
The code below sends a http request to a proxy server running on localhost port 3128 for the file http://localhost/bigfile.txt, which is about 80kb of the same line copy-pasted.
When i run it and pipe stdout to a file, the md5sum on the created file doesnt match the original bigfile.txt about 25% of the time. Inspection of the created file shows that every so often a handful of characters are missing from one of the lines.
I managed to boil my problem down into 80 lines of C.
If you'd rather view it as a seperate file- http://home.iprimus.com.au/sa014/test.c
Code:
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
int proxy_connect(char* proxyaddr, int proxyport);
int main()
{
//the purpose is to connect to a proxy server and use read() and write() to get a particular file.
//the file download process is prone to corruption, even when using localhost.
int sock;
char* buff;
int rv;
int bytesleft;
//first, connect to the proxy server (squid)
sock = proxy_connect("localhost", 3128);
buff = malloc(1024);
//send a http request asking for http://localhost/bigfile.txt to the proxy server
write(sock, "GET http://localhost/bigfile.txt HTTP/1.0\r\n"
"Host: localhost\r\n\r\n", strlen("GET "
"http://localhost/bigfile.txt HTTP/1.0\r\n"
"Host: localhost\r\n\r\n"));
//now we have to read the headers off our socket.
//we'll know we have the end of headers when we have \r followed by \n
read(sock, buff, 1);
while (buff[0] != '\r') //this would be \r after a \n, meaning end of headers
{
while (buff[0] != '\r')
{
read(sock, buff, 1);
}
read(sock, buff, 1); //this will yeild \n
read(sock, buff, 1); //this will be the first character of the next line.
//if it's \r, end of headers
}
read(sock, buff, 1); //this is the last \n before content starts.
//this is my transfer algorithm. The idea is to do io in 1kb chunks.
//the reason for this is that this is intended to be used to write to another socket
//and sending an entire tcp packet for a single byte is somewhat innefficient.
//also, this algorithm will also be used to send data from a CGI program with write(STDOUT_FILENO)
bytesleft = 82194; //this is the length of bigfile.txt
for (; bytesleft > 1024; bytesleft = bytesleft - 1024) //loop until bytesleft is less than 1024
//and decrease bytesleft by 1024 each time
{
//read and write 1024 bytes
rv = read(sock, buff, 1024);
if (rv < 1){perror("read"); exit(1);}
write(STDOUT_FILENO, buff, 1024);
}
//now we must read and write the remaining (<1024) bytes left on the socket.
read(sock, buff, bytesleft);
write(STDOUT_FILENO, buff, bytesleft);
free(buff);
close(sock);
return 0;
}
int proxy_connect(char* proxyaddr, int proxyport)
{
//this function accepts a host and a port, uses gethostbyname() to fill out hostent,
//then connects to that host with connect(). It returns a socket that can be read from/written to
struct hostent* proxy;
struct sockaddr_in proxy_addr;
int rval;
long int nhostaddr;
proxy = gethostbyname(proxyaddr);
if (proxy == NULL){
fprintf(stderr, "gethostbyname() failed\n");
exit(1);}
memcpy(&nhostaddr, proxy->h_addr, proxy->h_length);
rval = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
bzero((char*)&proxy_addr, sizeof(proxy_addr));
proxy_addr.sin_family = AF_INET;
proxy_addr.sin_port = htons(proxyport);
proxy_addr.sin_addr.s_addr = nhostaddr;
if (connect(rval, (struct sockaddr*)&proxy_addr, sizeof(proxy_addr)) == -1){
fprintf(stderr, "connect() failed\n");
exit(1);}
return rval;
}
You need to check the return values of your system calls. If they return "-1", you need to check the value of the global "errno". You can get a string version of the error by calling "strerror(errno)".
There is a reason functions return error codes. If you do not check them, you are like a driver who does not look out the car window, but wonders why he keeps crashing into trees.
In addition to checking for error returns from your read and write, you also need to check how many of the requested bytes were actually read and written. When working with sockets, you cannot assume that just because the function didn't return an error that it read / wrote all the bytes you asked it to.
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75
Original Poster
Rep:
Ah, ok...
But since it's a blocking socket, wouldn't it block until it was able to read all 1024 bytes?
Anyway i'll add that in and see if it fixes it.
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75
Original Poster
Rep:
Ah, much better. I modified the program like so and got the same md5sum 200 times in a row...so i'm guessing it's fixed. Thanks guys for pointing out the blindingly obvious, as always...
As an aside, why doesnt read() block if it can't read all 1024 bytes i asked it for?
Code:
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
int proxy_connect(char* proxyaddr, int proxyport);
static inline void fail(char* msg);
int main()
{
//the purpose is to connect to a proxy server and use read() and write() to get a particular file.
//the file download process is prone to corruption, even when using localhost.
int sock;
char* buff;
int rv;
int rv2;
int bytesleft;
//first, connect to the proxy server (squid)
sock = proxy_connect("localhost", 3128);
buff = malloc(1024);
//send a http request asking for http://localhost/bigfile.txt to the proxy server
write(sock, "GET http://localhost/bigfile.txt HTTP/1.0\r\n"
"Host: localhost\r\n\r\n", strlen("GET "
"http://localhost/bigfile.txt HTTP/1.0\r\n"
"Host: localhost\r\n\r\n"));
//now we have to read the headers off our socket.
//we'll know we have the end of headers when we have \r followed by \n
rv = read(sock, buff, 1);
if (rv < 1)
fail("read");
while (buff[0] != '\r') //this would be \r after a \n, meaning end of headers
{
while (buff[0] != '\r')
{
rv = read(sock, buff, 1);
if (rv < 1)
fail("read");
}
rv = read(sock, buff, 1); //this will yeild \n
if (rv < 1)
fail("read");
//this will be the first character of the next line. if it's \r, end of headers
rv = read(sock, buff, 1);
if (rv < 1)
fail("read");
}
rv = read(sock, buff, 1); //this is the last \n before content starts.
if (rv < 1)
fail("read");
//this is my transfer algorithm. The idea is to do io in 1kb chunks.
//the reason for this is that this is intended to be used to write to another socket
//and sending an entire tcp packet for a single byte is somewhat innefficient.
//also, this algorithm will also be used to send data from a CGI program with write(STDOUT_FILENO)
bytesleft = 82194; //this is the length of bigfile.txt
for (; bytesleft > 1024; bytesleft = bytesleft - rv) //loop until bytesleft is less than 1024
//and decrease bytesleft by 1024 each time
{
//read and write 1024 bytes
rv = read(sock, buff, 1024);
if (rv < 1)
fail("read");
rv2 = write(STDOUT_FILENO, buff, rv);
if (rv2 < rv)
fail("write");
}
//now we must read and write the remaining (<1024) bytes left on the socket.
while (bytesleft > 0)
{
rv = read(sock, buff, bytesleft);
if (rv < 1)
fail("read");
rv2 = write(STDOUT_FILENO, buff, rv);
if (rv2 < rv)
fail("write");
bytesleft = bytesleft - rv;
}
free(buff);
close(sock);
return 0;
}
int proxy_connect(char* proxyaddr, int proxyport)
{
//this function accepts a host and a port, uses gethostbyname() to fill out hostent,
//then connects to that host with connect(). It returns a socket that can be read from/written to
struct hostent* proxy;
struct sockaddr_in proxy_addr;
int rval;
long int nhostaddr;
proxy = gethostbyname(proxyaddr);
if (proxy == NULL){
fprintf(stderr, "gethostbyname() failed\n");
exit(1);}
memcpy(&nhostaddr, proxy->h_addr, proxy->h_length);
rval = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
bzero((char*)&proxy_addr, sizeof(proxy_addr));
proxy_addr.sin_family = AF_INET;
proxy_addr.sin_port = htons(proxyport);
proxy_addr.sin_addr.s_addr = nhostaddr;
if (connect(rval, (struct sockaddr*)&proxy_addr, sizeof(proxy_addr)) == -1){
fprintf(stderr, "connect() failed\n");
exit(1);}
return rval;
}
static inline void fail(char* msg)
{
perror(msg);
exit(1);
}
Last edited by greeklegend; 01-09-2008 at 06:55 PM.
As an aside, why doesnt read() block if it can't read all 1024 bytes i asked it for?
From the man page for READ(2) from the "Linux Programmer's Manual":
Code:
RETURN VALUE
On success, the number of bytes read is returned (zero
indicates end of file), and the file position is advanced
by this number. It is not an error if this number issmaller than the number of bytes requested; this may
happen for example because fewer bytes are actually available
right now (maybe because we were close to end-of-file, or
because we are reading from a pipe, or from a terminal), or
because read() was interrupted by a signal.
Please read your man pages. They will save you a lot of time.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.