Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
11-05-2009, 09:50 PM
|
#1
|
Member
Registered: Mar 2009
Posts: 82
Rep:
|
Looking for open source library better than Libcurl
I am using Libcurl to fetch http code for pages (only the status code in header, no body), but the performance is not so satisfying : some of the links would take around 1 seconds especially those from abroad. Of course, a best performance it has is 14 links for just 2 seconds (in google).
I am wondering if there is some other library could do better than Libcurl in this case ? Or it is reasonable for all to take 1 seconds just for http code for those remote links ?
I really need some more info to prove my boss whether there is better solution or not.
Any idea is well appreciated,
Thanks,
-Kun
|
|
|
11-06-2009, 02:41 AM
|
#2
|
LQ Newbie
Registered: Sep 2009
Posts: 5
Rep:
|
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:
1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code
It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..
Edit: Well, I just realized that you don't necessarily need another computer. Just more threads 
Last edited by timepasser; 11-06-2009 at 02:22 PM.
Reason: addition
|
|
|
11-06-2009, 05:18 AM
|
#3
|
Member
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811
Rep: 
|
To speed things up a bit, you can cache the DNS responses.
|
|
|
11-07-2009, 04:26 PM
|
#4
|
Member
Registered: Mar 2009
Posts: 82
Original Poster
Rep:
|
Thanks a lot, I think I got it right even it is not satisfying.
I did try multi-threaded, and it was less than 5% faster for pages containing many links (>150) and even worse for fewer ones (<20).
Looks like it is the best Libcurl could do.
Quote:
Originally Posted by timepasser
Well, if you consider how fetching a page works then you will understand that there are not many tricks (except parallelism) which will do it faster. When your browser or your libcurl scripts try to fetch a page they:
1. Request from your DNS server, the IP corresponding to the name of the site you requested
2. Use the server's reply to open a socket to that IP, port 80
3. Send a small HTTP message describing what you want
4. Receive the html code
It is not a cheap process and I think the performance you are getting is fine. If you really want thing to be quicker then you will have to use more than one computers which will download parts of a URL list..
Edit: Well, I just realized that you don't necessarily need another computer. Just more threads 
|
|
|
|
11-08-2009, 09:20 AM
|
#5
|
Senior Member
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,823
|
Have you tried libcurl's multi interface that I mentioned in your other thread? It allows concurrent transfers, probably with less overhead than threads (running a lot of threads at once can slow things down considerably).
Quote:
To speed things up a bit, you can cache the DNS responses.
|
I think libcurl already does this: http://curl.haxx.se/libcurl/c/libcur...ml#Persistence
|
|
|
11-09-2009, 07:57 AM
|
#6
|
Member
Registered: Mar 2009
Posts: 82
Original Poster
Rep:
|
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.
Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.
Quote:
Originally Posted by ntubski
|
|
|
|
11-09-2009, 10:42 AM
|
#7
|
Senior Member
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,823
|
Quote:
Originally Posted by Kunsheng
Thanks, I gave up on the usage of multi-interface since it seems will change my whole programming structure.
|
You changed from single threaded to multi threaded without changing your program structure?
Quote:
Could you give me some number of estimation on a multi-interface performance ? For example, 200 threaded VS Multi-interface (http code request only)? I will try that if it really have a big difference.
|
Sorry, haven't really written that many downloading programs. 200 threads does sound like a lot though, are you getting more than 90% cpu usage? It occurs to me that using threads with libcurl might prevent a lot of the optimizations mentioned in http://curl.haxx.se/libcurl/c/libcur...ml#Persistence, since each thread needs a separate handle.
|
|
|
All times are GMT -5. The time now is 03:30 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|