LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices


Reply
  Search this Thread
Old 04-24-2003, 06:47 AM   #1
doublefailure
Member
 
Registered: Mar 2002
Location: ma
Distribution: slackware
Posts: 747

Rep: Reputation: 30
how hard would making 'socket based web page retriever' be?


hello.

i've used urllib2 for getting some pages
but it's incredibly slow.
i've looked at some open source crawlers too.
but would take to figure out enough to modify them.


ultimately i want to have my own crawler.

before that, if i can handle just one page downloading.. it will be easier to change it to crawler.

send request
receive header
if it's text/html
store data in a file
when socket disconnects
close file

looks like simple , but i guess there are tons small things to take care


so my question is, how hard (how long would it take) to reasonably handle http protocol..

thank you
 
Old 04-24-2003, 08:35 PM   #2
nakkaya
LQ Guru
 
Registered: Jan 2003
Location: Turkey&USA
Distribution: Emacs and linux is its device driver(Slackware,redhat)
Posts: 1,398

Rep: Reputation: 45
i am working on the same subject crawler i can recommend beej guide to socet programming

send reguest easy
recive header you dont have do anything
if text/html is alittle more trick and thats the hard part cause you are going from a web page
storing is easier
close no problem
 
Old 04-24-2003, 08:36 PM   #3
nakkaya
LQ Guru
 
Registered: Jan 2003
Location: Turkey&USA
Distribution: Emacs and linux is its device driver(Slackware,redhat)
Posts: 1,398

Rep: Reputation: 45
btw you do not need alib for that just use plain socket.h
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Display a text file from hard drive into a web page EgosZ Linux - Networking 6 04-04-2005 10:51 PM
from a gui-based s/w to a web-based app h/w Programming 0 04-13-2004 02:07 PM
Can anone recommend some good gui based web page building apps LinuxBAH Linux - Software 2 02-16-2004 03:59 PM
Jerky mouse when web browsers download web page stodge Linux - Software 1 07-08-2003 10:29 PM
Need a graphics based web page program OmegaWarXIII Linux - Newbie 5 06-21-2001 01:47 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Networking

All times are GMT -5. The time now is 09:01 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration