LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Desktop (https://www.linuxquestions.org/questions/linux-desktop-74/)
-   -   Slow system performance (https://www.linuxquestions.org/questions/linux-desktop-74/slow-system-performance-861674/)

deathsfriend99 02-09-2011 01:59 PM

Slow system performance
 
This question really encompases allot of different toics, so I thought I'd throw it in here and hope something sticks.

I have 60+ client desktops all running CentOS 5.3 or higher. They are all pretty powerful machines (core2quads and corei7's) with between 4 and 8GB of RAM. The problem is they all run really slow. Frequent system stalls (2-3 secs of unresponsiveness) while running mundane things like emacs and firefox are very common. First I suspected video issues. I have tried both ATI and NVIDIA as well as their respective generic and proprietary drivers with no change.

I am serving all user profiles from an NFS/NIS server (quad core 8GB RAM). I am begining to wonder if there is a network bottleneck, or even how to begin troubleshooting that.

I'm really at a loss to figure out what the issue is, but it's becoming so bad, clients have been avoiding their desktops and have switched to using their personal (GASP) windows laptops due to the annoyance.

Any ideas where I could find start to find out what is bogging down my systems? Either internally or network?

lugoteehalt 02-09-2011 07:50 PM

Dunno but there are quite a few other threads about Centos being slow on this site, e.g. http://www.linuxquestions.org/questi...ed-why-634921/ . Hope some help.:)

syg00 02-09-2011 10:43 PM

Hmmm - Centos 5.3 might be too old, but maybe have a look at latencytop.

salasi 02-10-2011 09:29 AM

Quote:

Originally Posted by deathsfriend99 (Post 4253377)
I have 60+ client desktops all running CentOS 5.3 or higher. They are all pretty powerful machines (core2quads and corei7's) with between 4 and 8GB of RAM.

Those ought to be respectably quick machines; so there is something wrong, but what?

Quote:

Originally Posted by deathsfriend99 (Post 4253377)
Frequent system stalls (2-3 secs of unresponsiveness) while running mundane things like emacs and firefox are very common. First I suspected video issues. I have tried both ATI and NVIDIA as well as their respective generic and proprietary drivers with no change.

Good that you have checked out proprietary video, because that was a possibility.

Quote:

Originally Posted by deathsfriend99 (Post 4253377)
I am serving all user profiles from an NFS/NIS server (quad core 8GB RAM). I am begining to wonder if there is a network bottleneck, or even how to begin troubleshooting that.

I would guess this is the most likely; perhaps with, eg, wireshark you could look at packets going from a client machine to the NFS/NIS server, and see what the time delay is.

The other question is whether the 2-3 seconds of unresponsiveness is the only kind of slowness that you have? In other words, if you do intense things that are purely local, do those seem fast?

What about grabbing files from NFS? Is that OK?

Please ensure that you have IPV6 turned off. Also, if the slowness is purely internet-related, you could also check that DNS name lookups are reasonably swift (and not, eg, trying a non-existant nameserver first, before going over to the one that actually does give an answer).

If I were to guess, I would guess that, for one reason or another, the systems are doing a lot of waiting (and that might be network or disk); maybe top and friends might show something interesting.

deathsfriend99 02-15-2011 02:10 PM

Thanks for the reply. I do believe it is network or NFS related. A system not mounting NFS has no performance issues. Once connected to our NIS/NFS, performance slows. All home directories mount on the NFS server. It does show considerable disk and network activity, but I would expect the throughput of a system like that to be able to handle it. It has plenty of RAM, 100Mbs Fullduplex, and SATA HD's.
Wireshark didn't show anything glaring, although I was amazed at the number of requests to and from the NFS machine. Not sure if that is normal.
gkrellm shows:
CPU average 10%
Disk average 5.5M
Eth0 average 2M

fbsduser 02-15-2011 05:08 PM

Is your server's SATA controller set as "IDE emulation" (or something like that, you check that in the BIOS setup)? If it is set like that you'll get a very low throughput (since it's essentially emulating an IDE port). To fix it you need to set it to "SATA" or "AHCI" which is the native mode and will yield the full throughput of your SATA controller/disks.

DJ Shaji 02-16-2011 08:00 PM

I'm so not not qualified to comment here, but have you tried

* using a custom kernel?
* using a different desktop environment?
* turning off cron jobs?
* Updating frequently used packages to their latest versions?

Are you sure the system stalls are random? I mean, generally the kernel can be caught up in disk io for a few seconds and the system may get stuck. You could try the "noasync" flag for mounting the root or other local filesystems.

deathsfriend99 02-17-2011 11:57 AM

Quote:

Originally Posted by fbsduser (Post 4259604)
Is your server's SATA controller set as "IDE emulation" (or something like that, you check that in the BIOS setup)? If it is set like that you'll get a very low throughput (since it's essentially emulating an IDE port). To fix it you need to set it to "SATA" or "AHCI" which is the native mode and will yield the full throughput of your SATA controller/disks.

I don't recall. I haven't rebooted this machine in 6+months as it is the heart of the department. Is there a way to check that without going into the bios? All the drives are listed in /dev/sd# so I figured they were in sata mode.


Quote:

Originally Posted by DJ Shaji (Post 4260940)
I'm so not not qualified to comment here, but have you tried

* using a custom kernel?
* using a different desktop environment?
* turning off cron jobs?
* Updating frequently used packages to their latest versions?

Are you sure the system stalls are random? I mean, generally the kernel can be caught up in disk io for a few seconds and the system may get stuck. You could try the "noasync" flag for mounting the root or other local filesystems.

These are options, but CentOS is a very stable and widely used distro. It's NFS capabilities are standard "out of the box" and shouldn't need customization. I may look into trying different mounting options though. That is a great idea.

Guttorm 02-17-2011 12:31 PM

Hi


100mbit isn't very fast, it's only a fraction of the speed of a modern SATA disk. Even with one user, it's going to be the bottleneck.

fbsduser 02-18-2011 04:53 PM

Quote:

Originally Posted by deathsfriend99
I don't recall. I haven't rebooted this machine in 6+months as it is the heart of the department. Is there a way to check that without going into the bios? All the drives are listed in /dev/sd# so I figured they were in sata mode.

All drives will be listed as /dev/sd# regardless of what they`re connected to (IDE,SCSI,SATA), because of the way the kernel (more preciselly the libata library) handles disk connections. Essentially there`s AFAIK no way, other than going to the BIOS, of checking whether the controller is in native or legacy mode.

arizonagroovejet 02-25-2011 01:34 PM

Quote:

Originally Posted by deathsfriend99 (Post 4259437)
100Mbs Fullduplex,

Really? Gigabit ethernet has been around for many years. Why do you only have 100 Mbit/s? That sounds like it's going to be a problem because you're going to get maybe 10MB/s read/write speeds if you're lucky and that will be shared between all your 60 machines.

I administer a bunch of machines that use home directories mounted via NFS. I don't know the exact specs of the server since someone else looks after it, but I do know that it has a networking that's a LOT faster than 100Mbit/s (I'd guess it's 10GbE) and the home directories on are on an Enterprise grade hardware raid array stuffed full of drives that are almost certainly spinning a lot faster than yours are, that's connected to the server via fibre optic. The desktop machines have 1000Mbit/s connections back to the nearest switch. If I run this with the working directory set to my NFS mounted home directory
Code:

$ dd if=/dev/zero of=foo bs=1024 count=1048576
I get a write speed of around 20MB/s.

Now look at the set up I've just described. Now look at what the server specs you say you have. Now look at mine again and consider how much better it is and that I'm getting 20MB/s which, let's face it, is slow when you compare to a local disk.

Try that command for yourself with the working directory set to a NFS share and then again with the working directory set to somewhere on the local disk. You may find the results interesting.



When you say
Quote:

I am serving all user profiles from an NFS/NIS server
what do you mean by 'user profiles'? 'user profiles' is a concept I always associate with Windows, where the concept of a 'home directory' doesn't really seem to exist. Are you mounting user's home directories from the NFS server, or something else?

bluebox 02-25-2011 09:28 PM

What about the simple things ...

What does ifconfig say? Suspicious errors or dropped packets?

Anything about "eth0 link down" in dmesg? Anything suspicious there? Esp on the server?

Does the NIC share its IRQ with the graphics?

60 clients are connected to ... what? Is this "what" simply running hot, maybe?

Set up a simple ftp server on the server and do some basic throughput/reliability tests to the clients.

Do all clients hang at once?

Is there some kind of traffic control? Else, this guy with the 4 GB BMP desktop picture of his spouse will eat up other peoples bandwidth. Have a look at "ntop" or similar.

Firefox hanging does not say much ... but emacs? What is emacs trying to do when it hangs? Saving something? Is it the X11-emacs or console emacs?

Review the stuff stored on the server. There's not much sense in storing Firefox cache, desktop themes and similar things remotely.

Does the server use swap?

You're not doing wireless, do you?

deathsfriend99 02-25-2011 11:31 PM

Quote:

Originally Posted by bluebox (Post 4271549)
What does ifconfig say? Suspicious errors or dropped packets?

Nothing suspicious. No errors. no collisions, no dropped packets. nfsstat shows 3 retransmissions out of over 70,000,000 calls, so nothing odd there.
Quote:

Originally Posted by bluebox (Post 4271549)
Anything about "eth0 link down" in dmesg? Anything suspicious there? Esp on the server?

No eth0 link down in dmesg or anything out of the ordinary. Just iptables messages. Not sure what Esp is.
Quote:

Originally Posted by bluebox (Post 4271549)
Does the NIC share its IRQ with the graphics?

No.
Quote:

Originally Posted by bluebox (Post 4271549)
60 clients are connected to ... what? Is this "what" simply running hot, maybe?

This is a Core2Quad machine. lm_sensors gives 37C for the cores. Not overly hot for this CPU.

Quote:

Originally Posted by bluebox (Post 4271549)
Set up a simple ftp server on the server and do some basic throughput/reliability tests to the clients.

Havn't tried FTP but izone showed decent throughput.
Quote:

Originally Posted by bluebox (Post 4271549)
Do all clients hang at once?

Not sure. It's only for a second or 2, and it's transient and random so it's not easy to diagnose.
Quote:

Originally Posted by bluebox (Post 4271549)
Is there some kind of traffic control? Else, this guy with the 4 GB BMP desktop picture of his spouse will eat up other peoples bandwidth. Have a look at "ntop" or similar.

There is no traffic control. I have been unable to get ntop installed due to some dependency issues in CentOS. I'll keep looking.
Quote:

Originally Posted by bluebox (Post 4271549)
Firefox hanging does not say much ... but emacs? What is emacs trying to do when it hangs? Saving something? Is it the X11-emacs or console emacs?

Just typing in emacs, it can hang for a few seconds then catch up and all your text appears. It is both X-11 and terminal emacs.
Quote:

Originally Posted by bluebox (Post 4271549)
Review the stuff stored on the server. There's not much sense in storing Firefox cache, desktop themes and similar things remotely.

Haven't looked into this.
Quote:

Originally Posted by bluebox (Post 4271549)
Does the server use swap?

Yes, 2X RAM.
Quote:

Originally Posted by bluebox (Post 4271549)
You're not doing wireless, do you?

No.


One other thing, we are running a software RAID5, and sitting in the same room as the server, the drives are constantly running (ie: churning away). Maybe we should look into a hardware RAID? Is this more efficient?

bluebox 02-26-2011 02:01 PM

Quote:

Originally Posted by deathsfriend99 (Post 4271602)
Not sure what Esp is.

Sorry, "esp" is a lazy abbreviation for "especially".


Quote:

Originally Posted by deathsfriend99 (Post 4271602)
This is a Core2Quad machine. lm_sensors gives 37C for the cores. Not overly hot for this CPU.

So, your server is equipped with 60 NICs, directly serving 60 clients? No Switches inbetween?


Quote:

Originally Posted by deathsfriend99 (Post 4271602)
Havn't tried FTP but izone showed decent throughput.

Izone? You mean iozone? This would be a filesystem benchmark, helpful only when run on the clients to benchmark NFS performance. Throughput is not your problem, but freezes. Does iozone show freezes?

Quote:

Originally Posted by deathsfriend99 (Post 4271602)
It's only for a second or 2, and it's transient and random so it's not easy to diagnose.

Right. Linux usually throws some timeouts when filesystems or networks are hanging. But 2 seconds usually are not enough for a timeout. So, one way to diagnose your problem would be to put more stress on your network, with the intention to make things worse and finally get some explicit error messages.

Quote:

Originally Posted by deathsfriend99 (Post 4271602)
Just typing in emacs, it can hang for a few seconds then catch up and all your text appears. It is both X-11 and terminal emacs.

This is strange. When "just typing", there should be no network activity that could make emacs hang due to hanging NFS. Even though your server could very well be the culprit, it's still possible that there is some hardware problem on the clients, maybe always present, bot more noticeable when running in NFS and increased network traffic. Especially when all clients are build from identical hardware.

Quote:

Originally Posted by deathsfriend99 (Post 4271602)
Yes, 2X RAM.

I asked whether the server _uses_ swap, not whether there is swap. Swapping out harddisk content to harddisk again is a good prerequisite to slow down fileserving.

Quote:

Originally Posted by deathsfriend99 (Post 4271602)
One other thing, we are running a software RAID5, and sitting in the same room as the server, the drives are constantly running (ie: churning away). Maybe we should look into a hardware RAID? Is this more efficient?

Your bottleneck most likely is the network, except the case you really have 60 100 MB NICs inside your server. Hardware raid will lower CPU usage and transfer stress on the chipset, but I wouldn't expect this to solve your freeze-problem.

Hardware raid will not stop your harddrives from churning.

arizonagroovejet 02-26-2011 02:06 PM

Quote:

Originally Posted by bluebox (Post 4272146)
So, your server is equipped with 60 NICs,

Where do you get that the server has 60 NICS in it? That sounds like a rather implausible number of NICS to have in a server.

deathsfriend99 02-28-2011 08:32 AM

Quote:

Originally Posted by arizonagroovejet (Post 4272149)
Where do you get that the server has 60 NICS in it? That sounds like a rather implausible number of NICS to have in a server.

60 NICs? No i have 60+ clients connected to the server. The server has 1 NIC. Yes, there are a couple switches that cover this office, but I don't have access to them.

deathsfriend99 02-28-2011 08:48 AM

Quote:

Originally Posted by bluebox (Post 4272146)
Izone? You mean iozone? This would be a filesystem benchmark, helpful only when run on the clients to benchmark NFS performance. Throughput is not your problem, but freezes. Does iozone show freezes?

Iozone run for client side NFS only show latency plateaus when the filesize exceeds the buffer cache, but that is normal from what I read.

Quote:

Originally Posted by bluebox (Post 4272146)
Right. Linux usually throws some timeouts when filesystems or networks are hanging. But 2 seconds usually are not enough for a timeout. So, one way to diagnose your problem would be to put more stress on your network, with the intention to make things worse and finally get some explicit error messages.

I haven't tried this other than the large computational tasks running on various machines. Those tasks to slow down the network in general. I have not tried pure network stress. That is something I'll look into.

Quote:

Originally Posted by bluebox (Post 4272146)
This is strange. When "just typing", there should be no network activity that could make emacs hang due to hanging NFS. Even though your server could very well be the culprit, it's still possible that there is some hardware problem on the clients, maybe always present, bot more noticeable when running in NFS and increased network traffic. Especially when all clients are build from identical hardware.

This is why I initially suspected the video drivers, but swapping out drivers or changing the cards from Nvidia to ATI didn't solve the problem. I agree that it could be the fact that POS Dells are our only purchasing option.

Quote:

Originally Posted by bluebox (Post 4272146)
I asked whether the server _uses_ swap, not whether there is swap. Swapping out harddisk content to harddisk again is a good prerequisite to slow down fileserving.

I claim complete ignorance here.

Thanks for all the suggestions btw.

bluebox 02-28-2011 08:53 AM

Quote:

Originally Posted by deathsfriend99 (Post 4273954)
The server has 1 NIC. Yes, there are a couple switches that cover this office, but I don't have access to them.

Long time ago, I had extensive trouble with network traffic failing for some seconds, then recovering. After intensive troubleshooting I found out that all machines affected were connected to one special switch (actually it was a HUB at this time). The HUB was pretty hot, I replaced it, no more problems thereafter ...

You should gather more information about your freeze-problem.

Edit:
Quote:

I claim complete ignorance here.
When main memory gets short, its content is sourced out to harddisk with a big penalty in speed. This is good for big computational tasks with high memory usage - e.g. simulations. Better to get a result slowly than to break computations that ran for days.

When I see this list:
Quote:

1 NIS & NFS server
2 LDAP server
2 Samba servers
2 Mail servers
4 Web/ftp/ssh servers
2 PBX servers
there is nothing included that swap could be any good for.

So do a "swapon -s" and check whether the server is using a noticeable amount of swap. Some kb are usual.

deathsfriend99 02-28-2011 09:29 AM

Quote:

Originally Posted by bluebox (Post 4273979)
So do a "swapon -s" and check whether the server is using a noticeable amount of swap. Some kb are usual.

Filename Type Size Used Priority
/dev/hda3 partition 16386292 148 -1


I also ran iperf. Not sure if that is great tool for networkd performance, but here's the results from the server side with client connects:
Server and client IP's have been changed to protect the innocent.
server=111.11.111.11
client= 222.22.222.22

------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 111.11.111.11 port 5001 connected with 222.22.222.22 port 53814
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 111 MBytes 91.8 Mbits/sec
[ 5] local 111.11.111.11 port 5001 connected with 222.22.222.22 port 53815
[ 5] 0.0-10.1 sec 110 MBytes 90.9 Mbits/sec
------------------------------------------------------------
Client connecting to 222.22.222.22, TCP port 5001
TCP window size: 165 KByte (default)
------------------------------------------------------------
[ 5] local 111.11.111.11 port 34319 connected with 222.22.222.22 port 5001
[ 5] 0.0-10.0 sec 108 MBytes 90.4 Mbits/sec
[ 4] local 111.11.111.11 port 5001 connected with 222.22.222.22 port 53816
------------------------------------------------------------
Client connecting to 222.22.222.22, TCP port 5001
TCP window size: 40.3 KByte (default)
------------------------------------------------------------
[ 6] local 111.11.111.11 port 34320 connected with 222.22.222.22 port 5001
[ 6] 0.0-10.0 sec 96.6 MBytes 80.9 Mbits/sec
[ 4] 0.0-10.0 sec 92.6 MBytes 77.4 Mbits/sec

bluebox 02-28-2011 11:04 AM

Quote:

Originally Posted by deathsfriend99 (Post 4274011)
/dev/hda3 partition 16386292 148 -1

perfectly okay - supposed you did this on the server under load :)

Quote:

Originally Posted by deathsfriend99 (Post 4274011)
I also ran iperf.

First of all, your primary problem still is not network traffic, but freezes, right?

For the beginning, write a simple script that writes the actual time to a _local_ file, every 500 milliseconds, and execute it on a client affected by the freezes. After that review if really there is a timestamp inside the file all 500 ms. This will tell you whether the whole client freezes or just the gui.
To be completely sure, write the file to ramdisk and run the script dispatched.

Second, iperf does produce artificial network traffic and measures the result. It's good to see healthy results, but so this does not help us either in finding what is wrong.

You need something to record and evaluate your real world traffic, not artificial traffic.

Wireshark is not made to do this ... it is made to analyze decent connections and packet contents.

ntop is the tool of choice, but it doesn't install for you.

"darkstat" is said to be a good replacement for ntop.

Some console tools you could try are: "EthStatus", "IPTraf", "Nolad", "iftop", "MRTG"

Watch for either peaks in traffic, or "black holes" in traffic, on either the whole network or a single affected client, on a resolution down beyond seconds, not minutes or hours.

btw, what about swap usage on the clients? 4 to 8 GB should be more than enough, except there is some memory hog running. Once had all memory and then swap running full due to a buggy KdeIOSlave ... caused lags, freezes and random OOM-kills. And just had KDE Dolphin fill up my tmp directory, causing "disk full" errors ... results on NFS-homes could be interesting ...

deathsfriend99 03-04-2011 09:59 AM

Quote:

Originally Posted by bluebox (Post 4274110)
For the beginning, write a simple script that writes the actual time to a _local_ file, every 500 milliseconds, and execute it on a client affected by the freezes. After that review if really there is a timestamp inside the file all 500 ms. This will tell you whether the whole client freezes or just the gui.
To be completely sure, write the file to ramdisk and run the script dispatched.

This shows timestamps close to every 500ms. It drifts a bit, but no major freezing.

I did get ntop installed, but I'm not sure what I should be looking for. So much info, but nothing smaking me in the face as odd.

Something I did notice that is odd. If I start top and just drag a terminal box around in the GUI, my CPU usage for both Xorg and firefox jumps to 20% each. Seems excessive for just dragging a stupid box?

bluebox 03-04-2011 04:08 PM

Quote:

Originally Posted by deathsfriend99 (Post 4278948)
This shows timestamps close to every 500ms. It drifts a bit, but no major freezing.

As long as you did this test on a client that suffered from freezes during the test, it shows that there are no system freezes for seconds. So the most obvious part suffering from the freezes is the gui, so far.

Another thing you could check is the hard disk subsystem. Short freezes on the disk controller may lead to gui freezes. But, again, emacs shouldn't suffer from this (except there is swap usage). Look for a tool to measure disk I/O throughput in realtime.

And, again, have a look at the swap usage on the clients.

Quote:

Originally Posted by deathsfriend99 (Post 4278948)
I did get ntop installed, but I'm not sure what I should be looking for. So much info, but nothing smaking me in the face as odd.

uhm ... I'm not an ntop-wizard, either. And a detailed tutorial possibly is to much for this forum, beside the fact that there surely already are some tutorials out there.

When running ntop on the server, there are plenty of connections you could monitor. Maybe it is easier to install ntop on a client and let it monitor the traffic ... make sure it probes the traffic on a resolution beyond one second and when there are freezes, look if you see something suspicious in the ntop data.

Quote:

Originally Posted by deathsfriend99 (Post 4278948)
Something I did notice that is odd. If I start top and just drag a terminal box around in the GUI, my CPU usage for both Xorg and firefox jumps to 20% each. Seems excessive for just dragging a stupid box?

Uhm ... depends ... redrawing of the screen should be the job of X, and X is not known for it's speed and elegance ...

Firefox is a resource hog. This is especially true when not using the Adobe Flash plugin but the open source one (gnash I think) and not using Sun Java but the openjdk Java.

Looks like you should check whether these freezes are not that kind of random, but directly connected to firefox usage or slow X performance.

btw, keep in mind the thing about storing a firefox cache on NFS.


All times are GMT -5. The time now is 01:01 PM.