LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-09-2008, 05:48 AM   #1
greeklegend
Member
 
Registered: Feb 2006
Location: At a computer
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75

Rep: Reputation: 15
C sockets loosing data?


Hey all.
I want to read data from a website accessd through a proxy and send it to another socket. My problem is that I seem to be loosing a few bytes of data.
The code below sends a http request to a proxy server running on localhost port 3128 for the file http://localhost/bigfile.txt, which is about 80kb of the same line copy-pasted.
When i run it and pipe stdout to a file, the md5sum on the created file doesnt match the original bigfile.txt about 25% of the time. Inspection of the created file shows that every so often a handful of characters are missing from one of the lines.
I managed to boil my problem down into 80 lines of C.
If you'd rather view it as a seperate file- http://home.iprimus.com.au/sa014/test.c
Code:
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>

int proxy_connect(char* proxyaddr, int proxyport);

int main()
{
	//the purpose is to connect to a proxy server and use read() and write() to get a particular file.
	//the file download process is prone to corruption, even when using localhost.
	int sock;
	char* buff;
	int rv;
	int bytesleft;

	//first, connect to the proxy server (squid)
	sock = proxy_connect("localhost", 3128);
	buff = malloc(1024);
	//send a http request asking for http://localhost/bigfile.txt to the proxy server
	write(sock, "GET http://localhost/bigfile.txt HTTP/1.0\r\n"
			"Host: localhost\r\n\r\n", strlen("GET "
			"http://localhost/bigfile.txt HTTP/1.0\r\n"
			"Host: localhost\r\n\r\n"));
	//now we have to read the headers off our socket.
	//we'll know we have the end of headers when we have \r followed by \n
	read(sock, buff, 1);
	while (buff[0] != '\r') //this would be \r after a \n, meaning end of headers
	{
		while (buff[0] != '\r')
		{
			read(sock, buff, 1);
		}
		read(sock, buff, 1); //this will yeild \n
		read(sock, buff, 1); //this will be the first character of the next line.
				//if it's \r, end of headers
	}
	read(sock, buff, 1); //this is the last \n before content starts.

	//this is my transfer algorithm. The idea is to do io in 1kb chunks.
	//the reason for this is that this is intended to be used to write to another socket
	//and sending an entire tcp packet for a single byte is somewhat innefficient.
	//also, this algorithm will also be used to send data from a CGI program with write(STDOUT_FILENO)
	bytesleft = 82194; //this is the length of bigfile.txt
	for (; bytesleft > 1024; bytesleft = bytesleft - 1024) //loop until bytesleft is less than 1024
								//and decrease bytesleft by 1024 each time
	{
		//read and write 1024 bytes
		rv = read(sock, buff, 1024);
		if (rv < 1){perror("read"); exit(1);}
		write(STDOUT_FILENO, buff, 1024);
	}
	//now we must read and write the remaining (<1024) bytes left on the socket.
	read(sock, buff, bytesleft);
	write(STDOUT_FILENO, buff, bytesleft);

	free(buff);
	close(sock);
	return 0;
}

int proxy_connect(char* proxyaddr, int proxyport)
{
	//this function accepts a host and a port, uses gethostbyname() to fill out hostent,
	//then connects to that host with connect(). It returns a socket that can be read from/written to
	struct hostent* proxy;
	struct sockaddr_in proxy_addr;
	int rval;
	long int nhostaddr;

	proxy = gethostbyname(proxyaddr);

	if (proxy == NULL){
		fprintf(stderr, "gethostbyname() failed\n");
		exit(1);}
	memcpy(&nhostaddr, proxy->h_addr, proxy->h_length);
	rval = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	bzero((char*)&proxy_addr, sizeof(proxy_addr));
	proxy_addr.sin_family = AF_INET;
	proxy_addr.sin_port = htons(proxyport);
	proxy_addr.sin_addr.s_addr = nhostaddr;
	if (connect(rval, (struct sockaddr*)&proxy_addr, sizeof(proxy_addr)) == -1){
		fprintf(stderr, "connect() failed\n");
		exit(1);}
	return rval;
}
Any ideas?
 
Old 01-09-2008, 07:48 AM   #2
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by greeklegend View Post
Any ideas?
You need to check the return values of your system calls. If they return "-1", you need to check the value of the global "errno". You can get a string version of the error by calling "strerror(errno)".

There is a reason functions return error codes. If you do not check them, you are like a driver who does not look out the car window, but wonders why he keeps crashing into trees.
 
Old 01-09-2008, 08:30 AM   #3
JudyL
LQ Newbie
 
Registered: Aug 2007
Location: Florida, USA
Distribution: Ubuntu
Posts: 29

Rep: Reputation: 15
In addition to checking for error returns from your read and write, you also need to check how many of the requested bytes were actually read and written. When working with sockets, you cannot assume that just because the function didn't return an error that it read / wrote all the bytes you asked it to.

Judy
 
Old 01-09-2008, 09:37 AM   #4
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
read and write return the number of bytes done,
you need to check them
 
Old 01-09-2008, 06:32 PM   #5
greeklegend
Member
 
Registered: Feb 2006
Location: At a computer
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75

Original Poster
Rep: Reputation: 15
Ah, ok...
But since it's a blocking socket, wouldn't it block until it was able to read all 1024 bytes?
Anyway i'll add that in and see if it fixes it.
 
Old 01-09-2008, 06:54 PM   #6
greeklegend
Member
 
Registered: Feb 2006
Location: At a computer
Distribution: Ubuntu 7.04, LFS 6.3 rc1 (living dangerously ;), Windows XP
Posts: 75

Original Poster
Rep: Reputation: 15
Ah, much better. I modified the program like so and got the same md5sum 200 times in a row...so i'm guessing it's fixed. Thanks guys for pointing out the blindingly obvious, as always...
As an aside, why doesnt read() block if it can't read all 1024 bytes i asked it for?

Code:
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>


int proxy_connect(char* proxyaddr, int proxyport);
static inline void fail(char* msg);

int main()
{
	//the purpose is to connect to a proxy server and use read() and write() to get a particular file.
	//the file download process is prone to corruption, even when using localhost.
	int sock;
	char* buff;
	int rv;
	int rv2;
	int bytesleft;

	//first, connect to the proxy server (squid)
	sock = proxy_connect("localhost", 3128);
	buff = malloc(1024);
	//send a http request asking for http://localhost/bigfile.txt to the proxy server
	write(sock, "GET http://localhost/bigfile.txt HTTP/1.0\r\n"
			"Host: localhost\r\n\r\n", strlen("GET "
			"http://localhost/bigfile.txt HTTP/1.0\r\n"
			"Host: localhost\r\n\r\n"));
	//now we have to read the headers off our socket.
	//we'll know we have the end of headers when we have \r followed by \n
	rv = read(sock, buff, 1);
	if (rv < 1)
		fail("read");
	while (buff[0] != '\r') //this would be \r after a \n, meaning end of headers
	{
		while (buff[0] != '\r')
		{
			rv = read(sock, buff, 1);
			if (rv < 1)
				fail("read");
		}
		rv = read(sock, buff, 1); //this will yeild \n
		if (rv < 1)
			fail("read");
		//this will be the first character of the next line. if it's \r, end of headers
		rv = read(sock, buff, 1);
		if (rv < 1)
			fail("read");
	}
	rv = read(sock, buff, 1); //this is the last \n before content starts.
	if (rv < 1)
		fail("read");

	//this is my transfer algorithm. The idea is to do io in 1kb chunks.
	//the reason for this is that this is intended to be used to write to another socket
	//and sending an entire tcp packet for a single byte is somewhat innefficient.
	//also, this algorithm will also be used to send data from a CGI program with write(STDOUT_FILENO)
	bytesleft = 82194; //this is the length of bigfile.txt
	for (; bytesleft > 1024; bytesleft = bytesleft - rv) //loop until bytesleft is less than 1024
								//and decrease bytesleft by 1024 each time
	{
		//read and write 1024 bytes
		rv = read(sock, buff, 1024);
		if (rv < 1)
			fail("read");
		rv2 = write(STDOUT_FILENO, buff, rv);
		if (rv2 < rv)
			fail("write");
	}
	//now we must read and write the remaining (<1024) bytes left on the socket.
	while (bytesleft > 0)
	{
		rv = read(sock, buff, bytesleft);
		if (rv < 1)
			fail("read");
		rv2 = write(STDOUT_FILENO, buff, rv);
		if (rv2 < rv)
			fail("write");
		bytesleft = bytesleft - rv;
	}
	free(buff);
	close(sock);
	return 0;
}

int proxy_connect(char* proxyaddr, int proxyport)
{
	//this function accepts a host and a port, uses gethostbyname() to fill out hostent,
	//then connects to that host with connect(). It returns a socket that can be read from/written to
	struct hostent* proxy;
	struct sockaddr_in proxy_addr;
	int rval;
	long int nhostaddr;

	proxy = gethostbyname(proxyaddr);

	if (proxy == NULL){
		fprintf(stderr, "gethostbyname() failed\n");
		exit(1);}
	memcpy(&nhostaddr, proxy->h_addr, proxy->h_length);
	rval = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
	bzero((char*)&proxy_addr, sizeof(proxy_addr));
	proxy_addr.sin_family = AF_INET;
	proxy_addr.sin_port = htons(proxyport);
	proxy_addr.sin_addr.s_addr = nhostaddr;
	if (connect(rval, (struct sockaddr*)&proxy_addr, sizeof(proxy_addr)) == -1){
		fprintf(stderr, "connect() failed\n");
		exit(1);}
	return rval;
}

static inline void fail(char* msg)
{
	perror(msg);
	exit(1);
}

Last edited by greeklegend; 01-09-2008 at 06:55 PM.
 
Old 01-10-2008, 01:12 PM   #7
David1357
Senior Member
 
Registered: Aug 2007
Location: South Carolina, U.S.A.
Distribution: Ubuntu, Fedora Core, Red Hat, SUSE, Gentoo, DSL, coLinux, uClinux
Posts: 1,302
Blog Entries: 1

Rep: Reputation: 107Reputation: 107
Quote:
Originally Posted by greeklegend View Post
As an aside, why doesnt read() block if it can't read all 1024 bytes i asked it for?
From the man page for READ(2) from the "Linux Programmer's Manual":

Code:
RETURN VALUE
       On success, the number of bytes read is returned (zero
       indicates end  of file), and the file position is advanced
       by this number.  It is not an error if this number is
       smaller than the number of bytes requested; this may
       happen for example because fewer bytes are actually available
       right now (maybe because we were  close to  end-of-file, or
       because we are reading from a pipe, or from a terminal), or
       because read() was interrupted by a signal.
Please read your man pages. They will save you a lot of time.
 
Old 01-10-2008, 01:39 PM   #8
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 78
Quote:
Originally Posted by greeklegend View Post
As an aside, why doesnt read() block if it can't read all 1024 bytes i asked it for?
Blocking means that read will wait until data is available, but not necessarily until the amount of data you requested is available.
 
Old 01-10-2008, 04:40 PM   #9
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
Quote:

Ah, ok...
But since it's a blocking socket, wouldn't it block until it was able to read all 1024 bytes?
no, no, no
if the socket is line buffered it'll only read to a newline.
try with a netcat service like so:
(I put a puts(buff) in there to observe)

Code:
$ nc -lp 3128        
GET http://localhost/bigfile.txt HTTP/1.0
Host: localhost

hello   # I typed these lines
there   # the read returns after each line
also, you should check if the read or writes are interrupted by a signal
phew!

if you are serious about *nix programming you should buy this:
stevens
and
this

they will save about 10 years of trial and error and make you look clever on forums

Last edited by bigearsbilly; 01-10-2008 at 04:43 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can I combine three different bacup folders without loosing data commandante Linux - Desktop 3 05-30-2007 11:11 PM
Loosing power and data flycast Linux - Hardware 2 04-15-2007 01:23 AM
How to reinstall kubuntu without loosing data? AVD_ZM Ubuntu 2 03-27-2007 08:59 AM
How to rescue RedHat 9 Linux without loosing data suddavanda Linux - General 2 08-21-2006 11:21 PM
Can i resize my partition without loosing data? byen Linux - General 4 05-25-2005 07:39 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:03 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration