LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Sizing a Linux web server (https://www.linuxquestions.org/questions/linux-server-73/sizing-a-linux-web-server-660912/)

BigFunkyChief 08-06-2008 11:46 AM

Sizing a Linux web server
 
My company is building a website that they hope will get 1,000,000 hits a day (I think their estimation is high, but whatever).

I have been tasked to build the webserver and size it appropriately.

I am projecting between 10,000-100,000 hits a day to start off (first 3 months) I figured I could run just one webserver to start, and monitor it and add a clustered farm of servers if the load gets too high.

So, I'm trying to figure out how much one single webserver can handle. Box is a Pentium 4, 1.5 GB RAM and 250GB hard drive. Website content is static HTML (no MYSQL connects) and fairly low bandwidth, though this could change.

I've done some load testing on a similar box using the 'ab' tool. Benchmarking 100,000 connections, 50 at a time gives me descent results (ab -n 100000 -c 50 http://localhost/), with 99% of the requests being served in less than 63ms, which is quite acceptable.

Being new to webserver benchmarking, are the below results telling me one webserver can handle 50 concurrent connections and 100,000 hits in only a few minutes? And with a basic, low bandwidth website, does anyone have any experience as to what the webserver can handle?

I need to mention the site will be hosted at a data center, with around 20MBps of bandwidth available. Maybe that's more of my bottleneck?

Hope my questions is clear...trying to figure out how many webservers I need to start and how I'll know when to add another (besides just waiting and finding out when the load is too heavy!). Thanks in advance.


AB Results:

ab -n 100000 -c 50 http://localhost/
Requests per second: 1003.06 [#/sec] (mean)
Time per request: 49.847 [ms] (mean)
Time per request: 0.997 [ms] (mean, across all concurrent requests)
Transfer rate: 4031.38 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 4.5 0 52
Processing: 10 47 6.5 49 110
Waiting: 9 45 7.5 48 106
Total: 16 48 4.2 49 124

Percentage of the requests served within a certain time (ms)
50% 49
66% 50
75% 50
80% 50
90% 51
95% 53
98% 58
99% 63
100% 124 (longest request)

trickykid 08-06-2008 11:51 AM

Quote:

Originally Posted by BigFunkyChief (Post 3238872)
My company is building a website that they hope will get 1,000,000 hits a day (I think their estimation is high, but whatever).

Are they going to compete with Google? Might want to set their expectations but if they insist, I'd convince them with that many hits per day, you need not only one server but a cluster of servers with some load balancing in there.

BigFunkyChief 08-06-2008 11:56 AM

Hard to say exactly how many hits they get. Obviously they are shooting for 1,000,000 a day, but seems awful high.

In any case, they've asked me to basically get the minimum hardware necessary to make this thing work in the short term, and scale up as I see fit.

So, I guess a better question is for 10,000 hits a day, one webserver should be able to handle this? (assuming low bandwidth content and that not all 10,000 hits are coming at once).

trickykid 08-06-2008 01:36 PM

Quote:

Originally Posted by BigFunkyChief (Post 3238885)
Hard to say exactly how many hits they get. Obviously they are shooting for 1,000,000 a day, but seems awful high.

In any case, they've asked me to basically get the minimum hardware necessary to make this thing work in the short term, and scale up as I see fit.

So, I guess a better question is for 10,000 hits a day, one webserver should be able to handle this? (assuming low bandwidth content and that not all 10,000 hits are coming at once).

Easily. I have two dual PIII 1.13 GHz servers with only 2GB of memory that probably sees about 10,000 hits a day across all the sites I serve with very little impact or resource usage, including a database and email running on them. 10k spread out shouldn't be a problem. Now if my server got slashdotted and 10k tried to hit within an hours time, that might be a different story. ;)

chort 08-06-2008 01:51 PM

The key here is that it's static content. That's really, really easy to handle and your current box should have no problem with that. The trick is, if someone gets the bright idea to "revamp" the site with lots of flashy dynamic content, and they think it's "no big deal, cuz the webserver is fine with our traffic", then you're in for some problems...

Also, 20Mb/s is a huge amount of bandwidth. Is that your burstable max, or the sustained? Usually bandwidth has a soft cap that you're only allowed to exceed for a certain percentage of the time. If you're constantly using your burst limit, they will charge you $$$ for it.

If you go to a more dynamic website, you're going to need a lot more RAM (4GB at least), and as many CPU cores as you can throw at it. You also have to start worrying about the database design at that point, and how you spread the data group files across your storage spindles to keep I/O from being a bottleneck.

salasi 08-07-2008 02:47 AM

Quote:

Originally Posted by chort (Post 3238989)
The key here is that it's static content...

Yes, but it does make the risks higher if someone makes "minor" alterations...

Depending on what you are doing, squid in http accelerator mode to cahe outgoing content might make sense, too.

It sounds as though what you need isn't so much an estimate of "how many hits = how big a server" but an expansion plan that takes you through several stages of "when this proves inadequate, we pay x and go up to the next stage" however many hits that copes with.

BigFunkyChief 08-07-2008 05:23 PM

Thanks for all the replies. This has given me some good information I can bring forward, and like suggested I should come up with a plan to expand if my single server can't handle it.

On the bandwidth...I have a lucky situation where my friend runs an ISP and she's letting me drop a server right behind her outgoing pipe, so that's where the 20MBps comes from. I believe it's burstable up to 45MBps. Yes, a sweet deal!

BigFunkyChief 08-07-2008 05:27 PM

One other quick question, what's the best way to really test the capacity of my server?

Seems like sendiing requests from the localhost with ab really only tests the connections, it doesn't really test the content being served up. Once I have some test content, is there something else I can use remotely to test the webserver is see how much content it can serve before getting too slow to serve?

chrism01 08-07-2008 08:47 PM

Well, if you know any Perl, there are modules that can automate talking to websites. I tend to use WWW::Mechanize : http://search.cpan.org/~petdance/WWW...W/Mechanize.pm.
Just knock up a Perl prog to do a request, then wrap with eg a shell loop to run as many copies as fast as you want. Its easy.
:)

kenoshi 08-08-2008 01:23 PM

http://www.opensourcetesting.org/performance.php

I use OpenSTA and sometimes httperf for a quick test.

rbrodine 08-12-2008 04:48 PM

Here's a different way to solve the server sizing problem.



You might be interested in "Mathematical Server Sizing" software, freely available on SourceForge. Here's a link.

http://sourceforge.net/search/?type_...+server+sizing

Be sure and check out the documentation with this project. It describes some sample sizings. For a formal description of the math involved, see IEEE "Computer", July 2006 issue.

Good luck!


All times are GMT -5. The time now is 10:37 AM.