LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-19-2007, 07:11 PM   #1
sublyme718
LQ Newbie
 
Registered: Nov 2005
Location: Brooklyn, New York
Distribution: ubuntu
Posts: 25

Rep: Reputation: 15
internet page grab in c


Hello,

I am new to networks and c in linux. I am running ubuntu and I wanted to know if anyone knew of a tutorial that I could use to get information on writing a pgm to open up a web page from the command line in C. Thanks in advance.
 
Old 09-19-2007, 08:02 PM   #2
95se
Member
 
Registered: Apr 2002
Location: Windsor, ON, CA
Distribution: Ubuntu
Posts: 740

Rep: Reputation: 32
I take it you want to make a program that lets you get web pages from the command line. Is this correct? First, you do not need to do this. There already exists a number of programs that do this exact same thing (man wget or man curl). If you're still intent on doing this, then check out the libcurl page ( http://curl.haxx.se/libcurl/c/ ). There are other options then libcurl though.
 
Old 09-20-2007, 05:57 AM   #3
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
do you have netcat installed?
indispensable for playing with TCP networks.

try this (netcat = nc) in an xterm

nc -lp 50123

which will listen on port 50123
then point your browser at

localhost:50123

you can see what transpires
 
Old 09-20-2007, 09:55 AM   #4
sublyme718
LQ Newbie
 
Registered: Nov 2005
Location: Brooklyn, New York
Distribution: ubuntu
Posts: 25

Original Poster
Rep: Reputation: 15
I have some of the code but it just hangs when I compile and run it.

Code:
#include  <stdio.h>
#include  <stdlib.h>
#include  <unistd.h>
#include  <errno.h>
#include  <string.h>
#include  <netdb.h>
#include  <sys/types.h>
#include  <netinet/in.h>
#include  <sys/socket.h>

int main(int argc, char *argv[])
{
	struct hostent *hostinfo;
	char buf[1024];
	FILE * fsock;
	struct sockaddr_in name;
	int sock;

	hostinfo = gethostbyname("www.google.com");
	name.sin_family = AF_INET;
	name.sin_port = htons(80);
	name.sin_addr = *((struct in_addr*)hostinfo->h_addr);

	sock=socket(PF_INET,SOCK_STREAM,0);
	connect(sock,(struct sockaddr *)&name,sizeof(name));
	
	fsock = fdopen(sock,"r+");

	fprintf(fsock,"GET / HTTP/1.0\n\n");

	while(fgets(buf, 1024, fsock)){
		/*fputs(buf,stdout)*/puts(buf);
	}
	fclose(fsock);
}

Last edited by sublyme718; 09-20-2007 at 12:39 PM.
 
Old 09-20-2007, 07:42 PM   #5
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
You should listen to bigearsbillys very good advice. Use netcat (nc) as your web server on localhost, until you see that you are issuing the correct HTTP request. In fact use your own web server, once you have accomplished the very basics. It will allow you to control the responses to cover all sorts of circumstances, and also provide you with acces to log files which can be revealing.
--- rod.
 
Old 09-20-2007, 08:13 PM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Okay, so taking my own and bigearsbilly's advice, I built and ran your code against netcat, and against an Apache webserver on localhost, and it worked fine for me. Running seamonkey against netcat on localhost, I see it adding several additional HTTP headers, which I would speculate that Google requires before it will reply.
Code:
GET / HTTP/1.1
Host: localhost:50123
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
--- rod.
 
Old 09-21-2007, 05:20 AM   #7
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
you may have to be careful.
I haven't grabbed web pages but I have done SOAP stuff, over http with perl.

using fgets expects a line with a newline at the end.
but it may block waiting for input that doesn't come so that may give the appearance of hanging.
because the server will squirt say, 100 bytes and keep the socket open, while fgets is
waiting for the rest of it's input or a newline, which may not come.

it's all quite awkward to get right.

what you may need to do is, read the
Content-Length: header and fread the correct amount of bytes.

I can't do any checks here because of our firewall and proxy server.

If you are playing about just learning about networks,
maybe use your favourite socket enabled scripting language?
(perl python tcl)
It makes life easier till you get the hang of it.
 
Old 09-21-2007, 02:23 PM   #8
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 31
Quote:
Originally Posted by theNbomr View Post
You should listen to bigearsbillys very good advice.
What? This?
Quote:
iv) set fire to it.
 
Old 09-21-2007, 08:54 PM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Okay, I guess one needs to be a bit selective about taking advice....
 
Old 09-23-2007, 05:08 PM   #10
jdiggitydogg
Member
 
Registered: Sep 2007
Posts: 42

Rep: Reputation: 15
look up the HTTP specification or look carefully at the terminating characters in a sniffer. in a character-based protocol like HTTP, a specific character sequence indicates end of transmission. i would guess your code appears to hang because the webserver doesn't send anything back to you. (because your request is not properly formatted).

for debugging, watch your program in a sniffer to determine where in your code you are having problems (i.e. the writing functions or the reading functions).

HTTP uses characters \r\n to indicate end of line & end of request. (your code shows \n\n - which might work?)

I think the only required portion of an HTTP GET request is the GET command. I think the other fields are optional. For your fprintf call, try the following string:

Code:
fprintf(fsock,"GET / HTTP/1.0\r\n\r\n");
does that work?

Last edited by jdiggitydogg; 09-23-2007 at 05:09 PM.
 
Old 09-24-2007, 02:41 AM   #11
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
I think the Host field is mandatory.
 
Old 09-25-2007, 06:16 AM   #12
wjevans_7d1@yahoo.co
Member
 
Registered: Jun 2006
Location: Mariposa
Distribution: Slackware 9.1
Posts: 938

Rep: Reputation: 31
Quoth the highly esteemed bigearsbilly:
Quote:
I think the Host field is mandatory.
It is on HTTP/1.1, but not HTTP/1.0, according to this.
 
Old 09-25-2007, 07:04 AM   #13
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
blimey,
I was wrong!

there's a first time for everything!

 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Squid Redirect to welcome page on first internet access omid1979 Linux - Networking 3 10-15-2009 02:48 AM
Internet not working-Page could not be found Lolo21 Linux - Networking 13 03-28-2007 05:17 PM
simple script to grab an image from a web page and set background stardotstar Programming 43 09-11-2006 10:52 PM
LXer: Grab the Internet Explorer mu-mu-MULTIPATCH LXer Syndicated Linux News 0 04-14-2006 04:03 AM
Internet Explorer only internet page?? jimdaworm Linux - Software 9 08-08-2004 02:26 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration