LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Thread Tools
Old 11-05-2009, 10:50 PM   #1
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82
Thanked: 1
Looking for open source library better than Libcurl


[Log in to get rid of this advertisement]
I am using Libcurl to fetch http code for pages (only the status code in header, no body), but the performance is not so satisfying : some of the links would take around 1 seconds especially those from abroad. Of course, a best performance it has is 14 links for just 2 seconds (in google).

I am wondering if there is some other library could do better than Libcurl in this case ? Or it is reasonable for all to take 1 seconds just for http code for those remote links ?


I really need some more info to prove my boss whether there is better solution or not.

Any idea is well appreciated,



Thanks,

-Kun
windows_xp_2003 Kunsheng is offline     Reply With Quote
Old 11-06-2009, 03:41 AM   #2
timepasser
LQ Newbie
 
Registered: Sep 2009
Posts: 5
Thanked: 1
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:

1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code

It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..

Edit: Well, I just realized that you don't necessarily need another computer. Just more threads

Last edited by timepasser; 11-06-2009 at 03:22 PM.. Reason: addition
windows_xp_2003 timepasser is offline     Reply With Quote
Thanked by:
Old 11-06-2009, 06:18 AM   #3
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: Slackware 12.1
Posts: 449
Thanked: 51
To speed things up a bit, you can cache the DNS responses.
linux wje_lq is offline     Reply With Quote
Old 11-07-2009, 05:26 PM   #4
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82
Thanked: 1

Original Poster
Thanks a lot, I think I got it right even it is not satisfying.

I did try multi-threaded, and it was less than 5% faster for pages containing many links (>150) and even worse for fewer ones (<20).

Looks like it is the best Libcurl could do.



Quote:
Originally Posted by timepasser View Post
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:

1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code

It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..

Edit: Well, I just realized that you don't necessarily need another computer. Just more threads
windows_xp_2003 Kunsheng is offline     Reply With Quote
Old 11-08-2009, 10:20 AM   #5
ntubski
Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 698
Thanked: 50
Have you tried libcurl's multi interface that I mentioned in your other thread? It allows concurrent transfers, probably with less overhead than threads (running a lot of threads at once can slow things down considerably).

Quote:
To speed things up a bit, you can cache the DNS responses.
I think libcurl already does this: http://curl.haxx.se/libcurl/c/libcur...ml#Persistence
ntubski is online now     Reply With Quote
Old 11-09-2009, 08:57 AM   #6
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82
Thanked: 1

Original Poster
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.

Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.




Quote:
Originally Posted by ntubski View Post
Have you tried libcurl's multi interface that I mentioned in your other thread? It allows concurrent transfers, probably with less overhead than threads (running a lot of threads at once can slow things down considerably).


I think libcurl already does this: http://curl.haxx.se/libcurl/c/libcur...ml#Persistence
windows_xp_2003 Kunsheng is offline     Reply With Quote
Old 11-09-2009, 11:42 AM   #7
ntubski
Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 698
Thanked: 50
Quote:
Originally Posted by Kunsheng View Post
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.
You changed from single threaded to multi threaded without changing your program structure?

Quote:
Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.
Sorry, haven't really written that many downloading programs. 200 threads does sound like a lot though, are you getting more than 90% cpu usage? It occurs to me that using threads with libcurl might prevent a lot of the optimizations mentioned in http://curl.haxx.se/libcurl/c/libcur...ml#Persistence, since each thread needs a separate handle.
ntubski is online now     Reply With Quote

Reply

Bookmarks


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Any idea on Compile static libcurl library under Windows ? Kunsheng Programming 1 06-04-2009 04:46 PM
install from source - unable to find the libcurl library - but library is installed pulper Linux - Newbie 2 02-23-2009 10:00 PM
Open source graphing / plotting library? sysop Linux - Software 2 06-05-2007 04:25 PM
Open Source Gaming Physics/Dynamics Library kdogksu Linux - Games 1 03-30-2007 02:42 AM
Open source Digital Library jgnasser Linux - Software 0 03-16-2004 01:15 AM


All times are GMT -5. The time now is 01:35 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
RSS2  LQ Podcast
RSS2  LQ Radio
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration