FC5 slows down after period of time
We had a disk crash last week and since we re-installed with Fedora Core 5, the system seems to work fine while it is busy, but when we come in, in the morning, it is very, very slow.
I had to turn the power off yesterday morning. After the system rebooted, the system seemed to be fine. I monitored it all day. When we came in this morning, same problem. It takes minutes to execute commands. A friend suggested it might be a syslog problem. I restarted syslog, that didn't help. It took over 5 minutes for "service syslog restart" to complete. This box is only used for a Samba share. There are no direct users logged in on the system. When I run TOP, it takes a long time to redisplay. It will alternate between 0's for each CPU fields or 100% for wait. The top app in the list of apps is init. After I reboot, the system seems to routinely stay at 95% idle and the top app is Xorg. The system is an IBM ThinkCentre Model 8194-A4U. It has a 120GB drive. It is a Pentium 4, 2.4Ghz with 768MB RAM. Amazingly, I set up a different ThinkCentre last night that is already doing the same thing. It is a Model 8198-A2U with a 160GB drive. I believe it has a Pentium 4, 3.0Ghz with 256MB RAM. I put FC6 on and then did a yum update. The update was still running when I left. This morning yum had prompted for Y/N question and after I answered it, it is proceeding with the update, but it is very, very slow. I can't even get TOP to load. What I'm asking is if anyone has seen this in the past and what can I check to see what the problem is. I am guessing it might have to do with something in the powersave area where maybe the CPU or the disk is shutdown due to inactivity but now it isn't re-awakening. I've just noticed something else, on the primary system, before I reboot, I noticed that it only had about 9MB of memory left. After I reboot, it has about 331MB free. I'm not sure what would be taking up the memory. Thanks |
Do you see any errors in /var/log/messages and/or dmesg? The high I/O wait makes me suspect one or more of your drives is experiencing a problem.
|
Quote:
Jun 12 12:51:49 acsfs smbd[2264]: [2007/06/12 12:51:49, 0] smbd/service.c:make_connection_snum(592) Jun 12 12:51:49 acsfs smbd[2264]: Can't become connected user! Jun 12 12:54:18 acsfs smbd[2269]: [2007/06/12 12:54:18, 0] smbd/service.c:make_connection_snum(592) I don't see anything that jumps out on dmesg. I have been watching TOP. Shortly after I booted, I had approx 331MB of memory free. I did a ps aux to a file to record the processes. It quickly went down to around 188MB has the first 4 shares were loaded by users. It has slowly went down now to 45MB of free memory as reported by TOP. When I do a ps aux to a separate file and compare the two, they are almost identical except the latest one has 7 shares and to ssh connections and the first one only had 4 shares and 1 ssh connection. The sizes for the shares are slightly larger. In example the largest of the 4 smbd's from the first ps aux was 12220 and now the largest of the smbd's from the second ps aux is 13672. These are the VSZ column numbers, not the RSS size. Top shows uptime at 5:11 and 4 users. This seems like it is a huge memory leak. I guess I don't understand why ps aux doesn't show some program's size increasing dramatically. Is there a way to better track memory size of apps to see where it is all going? At this pace, I'm not sure I'm going to make it to 5:00 this afternoon. I also looked through /var/log/cron and I can see where cron.hourly ran without any problems through 4:00 am. But when cron.daily started a minute later, it doesn't seem to have finished. That was one of the things I noticed this morning was that the time in TOP showed around 4:00 am. I thought it was just a problem with the actual date of the system. But I wonder if there is something in the cron.daily that is killing the system. I had left TOP up and running from the night before. It is refreshing every 3 seconds. When I checked it this morning, it would show an updated time of every 3 or 4 seconds, but the refreshes were taking more than 30 seconds. It was like they were queued up. Thanks |
There's no indication of a memory leak. Instead of using top, use free:
Code:
# free |
Quote:
total used free shared buffers cached Mem: 767208 758140 9068 0 73820 480320 -/+ buffers/cache: 204000 563208 Swap: 1540088 0 1540088 I had just recently logged out of the console, then relogged in and then started a terminal session and started TOP and it fell from around 38MB free to about 9MB free. So, I'm seeing about the same thing as free is reporting. Is there a way to flush the cache? Thanks |
Sorry, didn't format the way it should have
Code:
total used free shared buffers cached |
Two thirds of your RAM is free, and you are not using any swap. There is no memory bottleneck on your system. If you were to "flush the cache", your system would grind to a halt (you think it's bad now), as every file I/O would require a real disk I/O.
|
I suggest you run "smartctl -A" on each of your drives. If they are not reporting errors, they may be experiencing high recoverable counts.
|
Code:
[root@acsfs ~]# smartctl -A /dev/hda |
That looks good. How about:
cat /proc/interrupts |
Code:
root@acsfs ~]# cat /proc/interrupts The apps that are staying at the very top is Xorg which now has CPU time of over 135:00:00 and floaters, which I think is a screen saver. It looks busier now that it has most of the day. We didn't have a console on the system until yesterday but it may be turned off. Any chance that could be a problem? Thanks |
Generally servers don't run X, and they certainly don't run screensavers - they just burn CPU for no good reason. However, while that may provide crappy response to your users, it won't cause your problem.
I'm not seeing any reason for the bad response time. You wouldn't happen to have an email server running on this system with an open relay? |
Not unless it comes that way from the install. I basically installed FC5, copied over by smb.conf file, turned on Samba and let it rip. It is a very vanilla install.
Thanks, |
Assuming you have all the maintenance applied, I see no reason for the performance problem you're having. My last suggestion would be to check for a compromised machine (very unlikely):
yum -y install chkrootkit Then run: chkrootkit -q -n |
Code:
root@acsfs ~]# chkrootkit -q -n Thanks, |
All times are GMT -5. The time now is 11:34 PM. |