[SOLVED] Caching web proxy, if possible simpler than Squid?
Linux - NetworkingThis forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Caching web proxy, if possible simpler than Squid?
Short description of my problem: I have several machines in my home, connected to the net with a rather low bandwidth connection (6Mbit/s) that I have to share with other people. So I want to reduce the bandwidth I use for surfing as much as possible to speed things up. Since I have a local server (Debian 6) running 24/7 anyways I thought about setting up a caching proxy on that machine, so that often used websites load faster if bandwidth usage from other users in network is high.
I don't need any filtering options or something like that, all that I want is to have a reasonably large cache on the server's disk (I thought about something like 10GB) to speed things up. If possible I would like to cache a) the most used websites, and b) downloads above a certain limit (I set the limit now arbitrary at 200MB).
I had a look at Squid, but that thing is simply overwhelming (the configuration file has several thousand lines) and I think simply overkill, since I don't need those whole bunch of filter functions and possibilities. I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.
Does anyone know a simple and easy configure program that achieves what I want?
TobiSGD, it's rather suprisingly to hear such a question from a user with 2400+ reputation :)
In fact, the basic Squid configuration does not need all those hundreds of lines. I think it's possible to have about 30 lines, and half of those will be very alike.
You need only tell Squid the local IP-address/port to listen on, the IP range of your clients, the list of allowed ports (like 80,443, 5190) and some other options. And it will be enough. http://beginlinux.com/server_trainin...d-proxy-server
I've heard about tinyproxy, but it does not cache.
There is also 3proxy, but I don't know if it caches the objects.
Why don't you want to set up a shaper ? Do you really need a caching proxy?
PS: as I noticed, the performance is better seen in log files, or after parsing them with some program like Sarg, which creates beautiful tables and counts how much traffic has been cached and how much has been downloaded.
TobiSGD, it's rather suprisingly to hear such a question from a user with 2400+ reputation :
Having a high reputation (sadly) does not mean that I know anything.
Quote:
You need only tell Squid the local IP-address/port to listen on, the IP range of your clients, the list of allowed ports (like 80,443, 5190) and some other options. And it will be enough. http://beginlinux.com/server_trainin...d-proxy-server
I followed a similar tutorial already, but it didn't speed up things, that is why I am asking.
Quote:
PS: as I noticed, the performance is better seen in log files, or after parsing them with some program like Sarg, which creates beautiful tables and counts how much traffic has been cached and how much has been downloaded.
My performance tests are rather simple:
1. Go to LQ's Forums site, press repeatedly the reload button on a browser with disabled cache (caching on dwb does not really work anyways), see if the site loads faster.
2. Go to a page with a massive amount of pictures, press repeatedly the reload button with disabled cache, see if it loads faster.
With squid in that basic configuration it does not load faster.
Quote:
Why don't you want to set up a shaper ? Do you really need a caching proxy?
It would be very convenient. Ever shared a 6MBit/s DSL line with 10 other people? Things can get really slow, so a local caching server can be really useful. And since I have several machines caching browsers are not a solution for me.
Quote:
I've heard about tinyproxy, but it does not cache.
There is also 3proxy, but I don't know if it caches the objects.
Thanks, I will have a look at 3proxy. Sadly, the documentation does not say anything about caching.
I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.
Ignoring the issue of whether squid's config is overwhelming (while it does have a fair number of lines, not much of it is essential, and the essential part isn't, imho, but this isn't the first issue to settle). The first question is whether caching can do anything about your problem.
In general, caching can only really help if several users (where 'user' is something other than a pink blob sitting behind a computer screen and creating problems...a different user can be a different browser, for example) try to get at the same data items at different times and there is a bottleneck somewhere that is upstream of cache (and, if you are concerned about speed, the overhead of doing all of the indexing of files and storage and recovery is less than the potential time savings).
OK
you've got the bottleneck, apparently, unless the bandwidth from your cache box to your user box(en) is idiotically limited (doubt that it is that limited, but indifferent wireless or power line networking could change that)
you have different users, but do the different users have much overlap in the websites that they commonly access?
many websites these days (I blame the CMSs, etc) are not really serving up the same data to all users - that is, if you are effectively 'logged in' (with whatever security, access privileges, etc, that implies) you get data that is 'customised' for you, so the data may not be cachable, and even if it is technically cachable, you may not get far because the data to different users is not recognised as being the same. If this is the case, not much progress will be possible.
your cache box is better equipped with a file system/hardware able to cope with retrieval of lots of (relatively) small files and a substantial lump of ram available. In the past, I have used reiserFS for this, but whether that is a sensible choice, these days, is another matter. And there is the temptation to use RAID or SSDs, but I suspect that doesn't really attack the main problem.
OTOH, something like a static page (or quasi-static page) that is just a list of .jpegs ought to be a lot more cacheable. (There is a good argument for increasing the 'keep' time for things like .jpegs, but you'll have to prove that caching can work for you before getting to that detail, eg, http://archive09.linux.com/feature/153221.)
And beware of all the ad-serving crud. A lot of the ad-crud does a lot of transactions, to and fro. OK, you want that page...but we want to serve you some ads....where have you been recently...ah, now we have to go off to Google to find out which of those are relevant to the people who will pay us for serving an ad....here is a connection to a completely different, overloaded, server that has a custom selected ad for you. There is a lot of crud there, and much of it entails switching from one site to another, and a cache can't do much about it ('cos what you are getting isn't determined until quite late in the process).
Another point is that squid, in particular, needs a good (quick) source of DNS look ups, so you'll be slow if your DNS look up is slow.
Have you tried to measure hit rates? My guess is that hit rate rather low. Also, you could measure times to get the data, but I'd guess hit rate would be a good place to start.
(And, BTW, I have assumed that speed is your main concern - it could be that data consumption is your main concern, and that would change the emphasis - it is probably slightly easier to argue for caching to decrease data consumption (ie, if you are on a plan with your ISP that attracts cost penalties above a certain data consumption) than for speed.)
you've got the bottleneck, apparently, unless the bandwidth from your cache box to your user box(en) is idiotically limited (doubt that it is that limited, but indifferent wireless or power line networking could change that)
The server is connected via 100MBit Ethernet with the other machines, so that will not limited in a way to make that approach unusable. The bottleneck is easy determined. I share a 6MBit connection with a bunch of other people, as soon as a few people are beginning to download something, watch a stream, ..., load time for webpages rise significantly. Examples follow.
Quote:
you have different users, but do the different users have much overlap in the websites that they commonly access?
At this time the main reason for me is to accelerate my web access without disturbing the others. So for now I am the only user of that proxy, from different machines.
Quote:
many websites these days (I blame the CMSs, etc) are not really serving up the same data to all users - that is, if you are effectively 'logged in' (with whatever security, access privileges, etc, that implies) you get data that is 'customised' for you, so the data may not be cachable, and even if it is technically cachable, you may not get far because the data to different users is not recognised as being the same. If this is the case, not much progress will be possible.
If it will accelerate at least a reasonable percentage of the sites I use that will good enough for me to do it.
Quote:
your cache box is better equipped with a file system/hardware able to cope with retrieval of lots of (relatively) small files and a substantial lump of ram available. In the past, I have used reiserFS for this, but whether that is a sensible choice, these days, is another matter. And there is the temptation to use RAID or SSDs, but I suspect that doesn't really attack the main problem.
The server in question is an machine with Atom 330 CPU (2*1.6GHz, Hyperthreading) with 1GB RAM and a 2TB harddisk, connected to the net via 100MBit Ethernet, acting mainly as file-server and personal IMAP server. The average load is minimal on that machine.
Quote:
Originally Posted by unSpawn
Polipo?
Quote:
Originally Posted by salasi
Like this. (In reply/amplification of unSpawn's post)
I'm sure that page is ancient (those Apache versions!), but doesn't have a date on it.
I had a look at that and found it be exactly the tool I was looking for. It was easy to configure (I changed 3 settings in the config file) and gave a significant boost on load times.
I hammered together a simple script that measures the download times (using wget) 3 times for a few sites I arbitrary have chosen, one time without proxy, one time with proxy. I repeated that test, but this time generating high bandwidth usage (downloading an ISO with wget) from a different machine.
I am pleased with the results, this is definitely good enough for me (these numbers may be influenced by other machines on the same network accessing the Internet, but you can see the trend):
Code:
No proxy
--------
http://www.linuxquestions.org
0m1.287s
0m0.534s
0m0.524s
http://www.slackware.com
0m0.657s
0m0.690s
sys 0m0.005s
http://distrowatch.com
0m1.539s
0m1.522s
0m2.095s
Proxy
-----
http://www.linuxquestions.org
0m0.530s
0m0.519s
0m0.554s
http://www.slackware.com
0m0.753s
0m0.231s
0m0.231s
http://distrowatch.com
0m1.505s
0m0.786s
0m0.234s
With high bandwidth usage from a different machine
**************************************************
No proxy
--------
http://www.linuxquestions.org
0m13.805s
0m7.094s
0m6.481s
http://www.slackware.com
0m5.107s
0m4.264s
0m4.417s
http://distrowatch.com
0m13.720s
0m13.391s
0m12.683s
Proxy
--------
http://www.linuxquestions.org
0m10.567s
0m6.011s
0m6.341s
http://www.slackware.com
0m5.332s
0m1.381s
0m1.481s
http://distrowatch.com
0m17.911s
0m15.241s
0m6.200s
Thanks all for your effort, I will mark this as solved.
Short description of my problem: I have several machines in my home, connected to the net with a rather low bandwidth connection (6Mbit/s) that I have to share with other people. So I want to reduce the bandwidth I use for surfing as much as possible to speed things up. Since I have a local server (Debian 6) running 24/7 anyways I thought about setting up a caching proxy on that machine, so that often used websites load faster if bandwidth usage from other users in network is high.
I don't need any filtering options or something like that, all that I want is to have a reasonably large cache on the server's disk (I thought about something like 10GB) to speed things up. If possible I would like to cache a) the most used websites, and b) downloads above a certain limit (I set the limit now arbitrary at 200MB).
I had a look at Squid, but that thing is simply overwhelming (the configuration file has several thousand lines) and I think simply overkill, since I don't need those whole bunch of filter functions and possibilities. I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.
Does anyone know a simple and easy configure program that achieves what I want?
HI,
Squid is the ideal proxy and You can use delaySpools..
Here is the example to limit 128kbps for single connection.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.