LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-05-2009, 09:50 PM   #1
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82

Rep: Reputation: 16
Looking for open source library better than Libcurl


I am using Libcurl to fetch http code for pages (only the status code in header, no body), but the performance is not so satisfying : some of the links would take around 1 seconds especially those from abroad. Of course, a best performance it has is 14 links for just 2 seconds (in google).

I am wondering if there is some other library could do better than Libcurl in this case ? Or it is reasonable for all to take 1 seconds just for http code for those remote links ?


I really need some more info to prove my boss whether there is better solution or not.

Any idea is well appreciated,



Thanks,

-Kun
 
Old 11-06-2009, 02:41 AM   #2
timepasser
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Rep: Reputation: 1
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:

1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code

It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..

Edit: Well, I just realized that you don't necessarily need another computer. Just more threads

Last edited by timepasser; 11-06-2009 at 02:22 PM. Reason: addition
 
Old 11-06-2009, 05:18 AM   #3
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
To speed things up a bit, you can cache the DNS responses.
 
Old 11-07-2009, 04:26 PM   #4
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82

Original Poster
Rep: Reputation: 16
Thanks a lot, I think I got it right even it is not satisfying.

I did try multi-threaded, and it was less than 5% faster for pages containing many links (>150) and even worse for fewer ones (<20).

Looks like it is the best Libcurl could do.



Quote:
Originally Posted by timepasser View Post
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:

1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code

It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..

Edit: Well, I just realized that you don't necessarily need another computer. Just more threads
 
Old 11-08-2009, 09:20 AM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,823

Rep: Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105
Have you tried libcurl's multi interface that I mentioned in your other thread? It allows concurrent transfers, probably with less overhead than threads (running a lot of threads at once can slow things down considerably).

Quote:
To speed things up a bit, you can cache the DNS responses.
I think libcurl already does this: http://curl.haxx.se/libcurl/c/libcur...ml#Persistence
 
Old 11-09-2009, 07:57 AM   #6
Kunsheng
Member
 
Registered: Mar 2009
Posts: 82

Original Poster
Rep: Reputation: 16
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.

Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.




Quote:
Originally Posted by ntubski View Post
Have you tried libcurl's multi interface that I mentioned in your other thread? It allows concurrent transfers, probably with less overhead than threads (running a lot of threads at once can slow things down considerably).


I think libcurl already does this: http://curl.haxx.se/libcurl/c/libcur...ml#Persistence
 
Old 11-09-2009, 10:42 AM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,823

Rep: Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105Reputation: 2105
Quote:
Originally Posted by Kunsheng View Post
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.
You changed from single threaded to multi threaded without changing your program structure?

Quote:
Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.
Sorry, haven't really written that many downloading programs. 200 threads does sound like a lot though, are you getting more than 90% cpu usage? It occurs to me that using threads with libcurl might prevent a lot of the optimizations mentioned in http://curl.haxx.se/libcurl/c/libcur...ml#Persistence, since each thread needs a separate handle.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Any idea on Compile static libcurl library under Windows ? Kunsheng Programming 1 06-04-2009 03:46 PM
install from source - unable to find the libcurl library - but library is installed pulper Linux - Newbie 2 02-23-2009 09:00 PM
Open source graphing / plotting library? sysop Linux - Software 2 06-05-2007 03:25 PM
Open Source Gaming Physics/Dynamics Library kdogksu Linux - Games 1 03-30-2007 01:42 AM
Open source Digital Library jgnasser Linux - Software 0 03-16-2004 12:15 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration