Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I have a web server that mounts /home to an Iomega NAS (IDE based) server via NFS. I made a mistake with rm-rf (I know, I know) and wiped out the data. I have a second server that I had built that is connected to a Disk Array. This server is loaded and is all SCSI based and kicks butt. It is running software Raid by the way. I was waiting for a chance to move the data to this box anyway so I copied my backups to this server which I will call NAS#2. I then mounted /home to "NAS#2". No problems yet, until I start the web sites. When I started the websites from the web server, the CPU load went through the roof! This is on the web server itself and not NAS#2. I brought all the sites back down and the CPU load dropped back down. So I brought each one up at a time and saw CPU load jump 3-6 points per site. I hope that this would be a temp thing so I let it run. It has now been 12 hours and I am in the 30's. It used to run in the .5-2 range.
So, we can rule out things like processor problems, disk (local) issues, and other issues involved with the local machine.
In effect, the only thing that has changed is the network link between the webserver and machine 1, and 2.
Now, looking at your stats I see a heapload of blocked processes (the b in the second column - which is for um.. uninteruptable sleeping processes?). Basically, these processes are all waiting for something to get back to them before continuing on their merry way.
Your memory / swap usage is changing none, so we're not running into paging problems.
your CPU Idle is hangin around 97% (barring the one freak 70%) and so, we're not CPU bound either.
Your block io looks okay too - with a bit of a spike which I expect represents some disk writing after your period of freak activity.
Which leaves only your Interupts...
Low and behold, you've got an average of around 200 or so, and then a spike of 3500. (which is why your cpu gets busy at the same time). This would represent a heapload of data hitting your machine at one time, after which.. relatively little.
I'd like to watch it a bit more, see if it's a regular thing. But I'd bet my last donut that this is what's causing your blocked processes.
It could be a number of things. If you're *certain* nothing has changed on the webserver machine, I would bet that your NS2 is misconfigured.
I would also bet that it's something like your NS2's network card being forced into Full Duplex when your switch is only doing half duplex.. or the other way around.
If I were you I'd load up a traffic monitor (look at iptraf it's a wonderful tool) and watch the interaction between your machines.
Also check ethtool / mii-tool for network card configuration / autonegotiation on the cards. I'm doubting it's a software issue, if only because your vmstat seems to indicate that it's neither cpu/memory or page file related...
Wanted to let you know I finally resolved this issue. Each web server has it own Apache instance. The boxes are cloned in partitions and OS version but I cannot say for certain that patches were made to both boxes. I recompiled Apache for each site and I am now running at:
load average: 0.05, 0.06, 0.07
I'm thinking it was a shared library issue or something..???