LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Networking
User Name
Password
Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game.

Notices

Reply
 
Search this Thread
Old 11-27-2012, 12:11 AM   #1
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Main: Gentoo Others: What fits the task
Posts: 15,592
Blog Entries: 2

Rep: Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045
Caching web proxy, if possible simpler than Squid?


Short description of my problem: I have several machines in my home, connected to the net with a rather low bandwidth connection (6Mbit/s) that I have to share with other people. So I want to reduce the bandwidth I use for surfing as much as possible to speed things up. Since I have a local server (Debian 6) running 24/7 anyways I thought about setting up a caching proxy on that machine, so that often used websites load faster if bandwidth usage from other users in network is high.

I don't need any filtering options or something like that, all that I want is to have a reasonably large cache on the server's disk (I thought about something like 10GB) to speed things up. If possible I would like to cache a) the most used websites, and b) downloads above a certain limit (I set the limit now arbitrary at 200MB).

I had a look at Squid, but that thing is simply overwhelming (the configuration file has several thousand lines) and I think simply overkill, since I don't need those whole bunch of filter functions and possibilities. I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.

Does anyone know a simple and easy configure program that achieves what I want?

Last edited by TobiSGD; 11-27-2012 at 12:12 AM.
 
Old 11-27-2012, 03:09 AM   #2
Lexus45
Member
 
Registered: Jan 2010
Location: Kurgan, Russia
Distribution: Slackware, Ubuntu
Posts: 339
Blog Entries: 3

Rep: Reputation: 47
TobiSGD, it's rather suprisingly to hear such a question from a user with 2400+ reputation :)

In fact, the basic Squid configuration does not need all those hundreds of lines. I think it's possible to have about 30 lines, and half of those will be very alike.

You need only tell Squid the local IP-address/port to listen on, the IP range of your clients, the list of allowed ports (like 80,443, 5190) and some other options. And it will be enough.
http://beginlinux.com/server_trainin...d-proxy-server

I've heard about tinyproxy, but it does not cache.
There is also 3proxy, but I don't know if it caches the objects.

Why don't you want to set up a shaper ? Do you really need a caching proxy?

PS: as I noticed, the performance is better seen in log files, or after parsing them with some program like Sarg, which creates beautiful tables and counts how much traffic has been cached and how much has been downloaded.

Last edited by Lexus45; 11-27-2012 at 03:19 AM.
 
1 members found this post helpful.
Old 11-27-2012, 07:12 AM   #3
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Main: Gentoo Others: What fits the task
Posts: 15,592
Blog Entries: 2

Original Poster
Rep: Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045
Quote:
Originally Posted by Lexus45 View Post
TobiSGD, it's rather suprisingly to hear such a question from a user with 2400+ reputation :
Having a high reputation (sadly) does not mean that I know anything.

Quote:
You need only tell Squid the local IP-address/port to listen on, the IP range of your clients, the list of allowed ports (like 80,443, 5190) and some other options. And it will be enough.
http://beginlinux.com/server_trainin...d-proxy-server
I followed a similar tutorial already, but it didn't speed up things, that is why I am asking.

Quote:
PS: as I noticed, the performance is better seen in log files, or after parsing them with some program like Sarg, which creates beautiful tables and counts how much traffic has been cached and how much has been downloaded.
My performance tests are rather simple:
1. Go to LQ's Forums site, press repeatedly the reload button on a browser with disabled cache (caching on dwb does not really work anyways), see if the site loads faster.
2. Go to a page with a massive amount of pictures, press repeatedly the reload button with disabled cache, see if it loads faster.
With squid in that basic configuration it does not load faster.

Quote:
Why don't you want to set up a shaper ? Do you really need a caching proxy?
It would be very convenient. Ever shared a 6MBit/s DSL line with 10 other people? Things can get really slow, so a local caching server can be really useful. And since I have several machines caching browsers are not a solution for me.

Quote:
I've heard about tinyproxy, but it does not cache.
There is also 3proxy, but I don't know if it caches the objects.
Thanks, I will have a look at 3proxy. Sadly, the documentation does not say anything about caching.
 
Old 11-27-2012, 10:11 AM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,254
Blog Entries: 54

Rep: Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834Reputation: 2834
Polipo?
 
1 members found this post helpful.
Old 11-27-2012, 10:47 AM   #5
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 3,899

Rep: Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774
Quote:
Originally Posted by TobiSGD View Post
I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.
Ignoring the issue of whether squid's config is overwhelming (while it does have a fair number of lines, not much of it is essential, and the essential part isn't, imho, but this isn't the first issue to settle). The first question is whether caching can do anything about your problem.

In general, caching can only really help if several users (where 'user' is something other than a pink blob sitting behind a computer screen and creating problems...a different user can be a different browser, for example) try to get at the same data items at different times and there is a bottleneck somewhere that is upstream of cache (and, if you are concerned about speed, the overhead of doing all of the indexing of files and storage and recovery is less than the potential time savings).

OK
  • you've got the bottleneck, apparently, unless the bandwidth from your cache box to your user box(en) is idiotically limited (doubt that it is that limited, but indifferent wireless or power line networking could change that)
  • you have different users, but do the different users have much overlap in the websites that they commonly access?
  • many websites these days (I blame the CMSs, etc) are not really serving up the same data to all users - that is, if you are effectively 'logged in' (with whatever security, access privileges, etc, that implies) you get data that is 'customised' for you, so the data may not be cachable, and even if it is technically cachable, you may not get far because the data to different users is not recognised as being the same. If this is the case, not much progress will be possible.
  • your cache box is better equipped with a file system/hardware able to cope with retrieval of lots of (relatively) small files and a substantial lump of ram available. In the past, I have used reiserFS for this, but whether that is a sensible choice, these days, is another matter. And there is the temptation to use RAID or SSDs, but I suspect that doesn't really attack the main problem.

OTOH, something like a static page (or quasi-static page) that is just a list of .jpegs ought to be a lot more cacheable. (There is a good argument for increasing the 'keep' time for things like .jpegs, but you'll have to prove that caching can work for you before getting to that detail, eg, http://archive09.linux.com/feature/153221.)

And beware of all the ad-serving crud. A lot of the ad-crud does a lot of transactions, to and fro. OK, you want that page...but we want to serve you some ads....where have you been recently...ah, now we have to go off to Google to find out which of those are relevant to the people who will pay us for serving an ad....here is a connection to a completely different, overloaded, server that has a custom selected ad for you. There is a lot of crud there, and much of it entails switching from one site to another, and a cache can't do much about it ('cos what you are getting isn't determined until quite late in the process).

Another point is that squid, in particular, needs a good (quick) source of DNS look ups, so you'll be slow if your DNS look up is slow.

Have you tried to measure hit rates? My guess is that hit rate rather low. Also, you could measure times to get the data, but I'd guess hit rate would be a good place to start.

(And, BTW, I have assumed that speed is your main concern - it could be that data consumption is your main concern, and that would change the emphasis - it is probably slightly easier to argue for caching to decrease data consumption (ie, if you are on a plan with your ISP that attracts cost penalties above a certain data consumption) than for speed.)
 
1 members found this post helpful.
Old 11-27-2012, 10:54 AM   #6
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 3,899

Rep: Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774Reputation: 774
Like this. (In reply/amplification of unSpawn's post)

I'm sure that page is ancient (those Apache versions!), but doesn't have a date on it.
 
1 members found this post helpful.
Old 11-27-2012, 07:12 PM   #7
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Main: Gentoo Others: What fits the task
Posts: 15,592
Blog Entries: 2

Original Poster
Rep: Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045Reputation: 4045
Quote:
Originally Posted by salasi View Post
you've got the bottleneck, apparently, unless the bandwidth from your cache box to your user box(en) is idiotically limited (doubt that it is that limited, but indifferent wireless or power line networking could change that)
The server is connected via 100MBit Ethernet with the other machines, so that will not limited in a way to make that approach unusable. The bottleneck is easy determined. I share a 6MBit connection with a bunch of other people, as soon as a few people are beginning to download something, watch a stream, ..., load time for webpages rise significantly. Examples follow.

Quote:
you have different users, but do the different users have much overlap in the websites that they commonly access?
At this time the main reason for me is to accelerate my web access without disturbing the others. So for now I am the only user of that proxy, from different machines.

Quote:
many websites these days (I blame the CMSs, etc) are not really serving up the same data to all users - that is, if you are effectively 'logged in' (with whatever security, access privileges, etc, that implies) you get data that is 'customised' for you, so the data may not be cachable, and even if it is technically cachable, you may not get far because the data to different users is not recognised as being the same. If this is the case, not much progress will be possible.
If it will accelerate at least a reasonable percentage of the sites I use that will good enough for me to do it.

Quote:
your cache box is better equipped with a file system/hardware able to cope with retrieval of lots of (relatively) small files and a substantial lump of ram available. In the past, I have used reiserFS for this, but whether that is a sensible choice, these days, is another matter. And there is the temptation to use RAID or SSDs, but I suspect that doesn't really attack the main problem.
The server in question is an machine with Atom 330 CPU (2*1.6GHz, Hyperthreading) with 1GB RAM and a 2TB harddisk, connected to the net via 100MBit Ethernet, acting mainly as file-server and personal IMAP server. The average load is minimal on that machine.

Quote:
Originally Posted by unSpawn
Polipo?
Quote:
Originally Posted by salasi
Like this. (In reply/amplification of unSpawn's post)

I'm sure that page is ancient (those Apache versions!), but doesn't have a date on it.
I had a look at that and found it be exactly the tool I was looking for. It was easy to configure (I changed 3 settings in the config file) and gave a significant boost on load times.
I hammered together a simple script that measures the download times (using wget) 3 times for a few sites I arbitrary have chosen, one time without proxy, one time with proxy. I repeated that test, but this time generating high bandwidth usage (downloading an ISO with wget) from a different machine.
I am pleased with the results, this is definitely good enough for me (these numbers may be influenced by other machines on the same network accessing the Internet, but you can see the trend):
Code:
No proxy
--------
http://www.linuxquestions.org

0m1.287s
0m0.534s
0m0.524s


http://www.slackware.com

0m0.657s
0m0.690s
sys 0m0.005s


http://distrowatch.com

0m1.539s
0m1.522s
0m2.095s



Proxy
-----
http://www.linuxquestions.org

0m0.530s
0m0.519s
0m0.554s

http://www.slackware.com

0m0.753s
0m0.231s
0m0.231s

http://distrowatch.com

0m1.505s
0m0.786s
0m0.234s


With high bandwidth usage from a different machine
**************************************************

No proxy
--------
http://www.linuxquestions.org

0m13.805s
0m7.094s
0m6.481s

http://www.slackware.com

0m5.107s
0m4.264s
0m4.417s

http://distrowatch.com

0m13.720s
0m13.391s
0m12.683s



Proxy
--------
http://www.linuxquestions.org

0m10.567s
0m6.011s
0m6.341s

http://www.slackware.com

0m5.332s
0m1.381s
0m1.481s

http://distrowatch.com

0m17.911s
0m15.241s
0m6.200s
Thanks all for your effort, I will mark this as solved.
 
Old 12-01-2012, 11:03 PM   #8
overdoseattitude
LQ Newbie
 
Registered: Oct 2012
Location: Bangalore
Distribution: CentOS, Ubuntu, OSX LION
Posts: 8

Rep: Reputation: Disabled
Quote:
Originally Posted by TobiSGD View Post
Short description of my problem: I have several machines in my home, connected to the net with a rather low bandwidth connection (6Mbit/s) that I have to share with other people. So I want to reduce the bandwidth I use for surfing as much as possible to speed things up. Since I have a local server (Debian 6) running 24/7 anyways I thought about setting up a caching proxy on that machine, so that often used websites load faster if bandwidth usage from other users in network is high.

I don't need any filtering options or something like that, all that I want is to have a reasonably large cache on the server's disk (I thought about something like 10GB) to speed things up. If possible I would like to cache a) the most used websites, and b) downloads above a certain limit (I set the limit now arbitrary at 200MB).

I had a look at Squid, but that thing is simply overwhelming (the configuration file has several thousand lines) and I think simply overkill, since I don't need those whole bunch of filter functions and possibilities. I tried it nonetheless, following several rather simple how-tos on the web and the performance gain was about 0%, so either I have done something wrong or Squid is simply to big for this simple purpose.

Does anyone know a simple and easy configure program that achieves what I want?
HI,
Squid is the ideal proxy and You can use delaySpools..
Here is the example to limit 128kbps for single connection.

acl only128kusers src 192.168.1.0/24
delay_pools 1
delay_class 1 3
delay_access 1 allow only128kusers
delay_access 1 deny all
delay_parameters 1 64000/64000 -1/-1 16000/64000

Squid is best for proxy i believe ....
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Squid Proxy Not Caching Files apit Linux - Networking 2 10-21-2009 10:48 AM
squid - reverse caching proxy with authentication dazdaz Linux - Software 2 08-31-2007 03:37 AM
proxy caching(squid) nics Linux - Software 8 05-23-2007 10:25 AM
squid as a non caching proxy stephen_davies Linux - Newbie 4 08-04-2005 12:43 PM
Squid Proxy - No Caching Option?! pin_bk Linux - Networking 2 06-03-2005 04:24 AM


All times are GMT -5. The time now is 09:06 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration