LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 06-12-2007, 08:13 AM   #1
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Rep: Reputation: 0
FC5 slows down after period of time


We had a disk crash last week and since we re-installed with Fedora Core 5, the system seems to work fine while it is busy, but when we come in, in the morning, it is very, very slow.

I had to turn the power off yesterday morning. After the system rebooted, the system seemed to be fine. I monitored it all day. When we came in this morning, same problem. It takes minutes to execute commands.

A friend suggested it might be a syslog problem. I restarted syslog, that didn't help. It took over 5 minutes for "service syslog restart" to complete. This box is only used for a Samba share. There are no direct users logged in on the system.

When I run TOP, it takes a long time to redisplay. It will alternate between 0's for each CPU fields or 100% for wait. The top app in the list of apps is init. After I reboot, the system seems to routinely stay at 95% idle and the top app is Xorg.

The system is an IBM ThinkCentre Model 8194-A4U. It has a 120GB drive. It is a Pentium 4, 2.4Ghz with 768MB RAM.

Amazingly, I set up a different ThinkCentre last night that is already doing the same thing. It is a Model 8198-A2U with a 160GB drive. I believe it has a Pentium 4, 3.0Ghz with 256MB RAM. I put FC6 on and then did a yum update. The update was still running when I left. This morning yum had prompted for Y/N question and after I answered it, it is proceeding with the update, but it is very, very slow. I can't even get TOP to load.

What I'm asking is if anyone has seen this in the past and what can I check to see what the problem is. I am guessing it might have to do with something in the powersave area where maybe the CPU or the disk is shutdown due to inactivity but now it isn't re-awakening.

I've just noticed something else, on the primary system, before I reboot, I noticed that it only had about 9MB of memory left. After I reboot, it has about 331MB free. I'm not sure what would be taking up the memory.

Thanks
 
Old 06-12-2007, 12:04 PM   #2
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Do you see any errors in /var/log/messages and/or dmesg? The high I/O wait makes me suspect one or more of your drives is experiencing a problem.
 
Old 06-12-2007, 01:11 PM   #3
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by macemoneta
Do you see any errors in /var/log/messages and/or dmesg? The high I/O wait makes me suspect one or more of your drives is experiencing a problem.
I get these in /var/log/messages frequently:

Jun 12 12:51:49 acsfs smbd[2264]: [2007/06/12 12:51:49, 0] smbd/service.c:make_connection_snum(592)
Jun 12 12:51:49 acsfs smbd[2264]: Can't become connected user!
Jun 12 12:54:18 acsfs smbd[2269]: [2007/06/12 12:54:18, 0] smbd/service.c:make_connection_snum(592)

I don't see anything that jumps out on dmesg.

I have been watching TOP. Shortly after I booted, I had approx 331MB of memory free. I did a ps aux to a file to record the processes. It quickly went down to around 188MB has the first 4 shares were loaded by users. It has slowly went down now to 45MB of free memory as reported by TOP. When I do a ps aux to a separate file and compare the two, they are almost identical except the latest one has 7 shares and to ssh connections and the first one only had 4 shares and 1 ssh connection. The sizes for the shares are slightly larger. In example the largest of the 4 smbd's from the first ps aux was 12220 and now the largest of the smbd's from the second ps aux is 13672. These are the VSZ column numbers, not the RSS size.

Top shows uptime at 5:11 and 4 users.

This seems like it is a huge memory leak. I guess I don't understand why ps aux doesn't show some program's size increasing dramatically. Is there a way to better track memory size of apps to see where it is all going? At this pace, I'm not sure I'm going to make it to 5:00 this afternoon.

I also looked through /var/log/cron and I can see where cron.hourly ran without any problems through 4:00 am. But when cron.daily started a minute later, it doesn't seem to have finished. That was one of the things I noticed this morning was that the time in TOP showed around 4:00 am. I thought it was just a problem with the actual date of the system. But I wonder if there is something in the cron.daily that is killing the system.

I had left TOP up and running from the night before. It is refreshing every 3 seconds. When I checked it this morning, it would show an updated time of every 3 or 4 seconds, but the refreshes were taking more than 30 seconds. It was like they were queued up.

Thanks
 
Old 06-12-2007, 01:50 PM   #4
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
There's no indication of a memory leak. Instead of using top, use free:

Code:
# free
             total       used       free     shared    buffers     cached
Mem:       1032788     966756      66032          0      37544     611692
-/+ buffers/cache:     317520     715268
Swap:       265064        176     264888
The value you're interested in is highlighted. While top would show "66032"free in this example, it doesn't take into consideration the amount of memory being used for cache buffers which prevents disk I/O, and can be reclaimed if memory is actually needed.
 
Old 06-12-2007, 02:50 PM   #5
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by macemoneta
There's no indication of a memory leak. Instead of using top, use free:

Code:
# free
             total       used       free     shared    buffers     cached
Mem:       1032788     966756      66032          0      37544     611692
-/+ buffers/cache:     317520     715268
Swap:       265064        176     264888
The value you're interested in is highlighted. While top would show "66032"free in this example, it doesn't take into consideration the amount of memory being used for cache buffers which prevents disk I/O, and can be reclaimed if memory is actually needed.
[root@acsfs ~]# free
total used free shared buffers cached
Mem: 767208 758140 9068 0 73820 480320
-/+ buffers/cache: 204000 563208
Swap: 1540088 0 1540088

I had just recently logged out of the console, then relogged in and then started a terminal session and started TOP and it fell from around 38MB free to about 9MB free. So, I'm seeing about the same thing as free is reporting.

Is there a way to flush the cache?

Thanks
 
Old 06-12-2007, 02:59 PM   #6
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Sorry, didn't format the way it should have

Code:
                total        used        free   shared   buffers  cached
Mem:            767208       758140      9068   0        73820    480320
-/+ buffers/cache:           204000      563208
Swap:           1540088      0           1540088
 
Old 06-12-2007, 03:00 PM   #7
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Two thirds of your RAM is free, and you are not using any swap. There is no memory bottleneck on your system. If you were to "flush the cache", your system would grind to a halt (you think it's bad now), as every file I/O would require a real disk I/O.
 
Old 06-12-2007, 03:05 PM   #8
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
I suggest you run "smartctl -A" on each of your drives. If they are not reporting errors, they may be experiencing high recoverable counts.
 
Old 06-12-2007, 03:48 PM   #9
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Code:
[root@acsfs ~]# smartctl -A /dev/hda
smartctl version 5.33 [i386-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   201   201   063    Pre-fail  Always       -       14851
  4 Start_Stop_Count        0x0032   242   242   000    Old_age   Always       -       23369
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   252   244   187    Pre-fail  Always       -       48484
  9 Power_On_Minutes        0x0032   212   212   000    Old_age   Always       -       26h+50m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       267
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0032   253   253   000    Old_age   Always       -       47
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       2502
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   242   000    Old_age   Always       -       320
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       3
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   191   189   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

[root@acsfs ~]#
 
Old 06-12-2007, 07:39 PM   #10
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
That looks good. How about:

cat /proc/interrupts
 
Old 06-12-2007, 08:19 PM   #11
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Code:
root@acsfs ~]# cat /proc/interrupts
           CPU0
  0:   11076110    IO-APIC-edge  timer
  1:        244    IO-APIC-edge  i8042
  7:    2119934    IO-APIC-edge  parport0
  8:          1    IO-APIC-edge  rtc
  9:          1   IO-APIC-level  acpi
 12:       9609    IO-APIC-edge  i8042
 14:     153636    IO-APIC-edge  ide0
 15:     382769    IO-APIC-edge  ide1
 16:          0   IO-APIC-level  uhci_hcd:usb3
 17:    3748946   IO-APIC-level  uhci_hcd:usb1, uhci_hcd:usb4, i915@pci:0000:00:02.0
 18:          0   IO-APIC-level  uhci_hcd:usb2
 19:          0   IO-APIC-level  ehci_hcd:usb5
 20:         23   IO-APIC-level  Intel ICH5
 21:   10548523   IO-APIC-level  eth0
NMI:          0
LOC:   11076366
ERR:          0
MIS:          0
Also, well after working hours, I see TOP is active as it has been all day. The CPU us fld is bouncing around 25-40% and the idle of course rarely stays up around 90%+ like it had been all day. At peak, there were 9 smbd processes, now there is two. Most users turn their PC's off at night. Again, their are no direct logins.

The apps that are staying at the very top is Xorg which now has CPU time of over 135:00:00 and floaters, which I think is a screen saver.

It looks busier now that it has most of the day.

We didn't have a console on the system until yesterday but it may be turned off. Any chance that could be a problem?

Thanks
 
Old 06-12-2007, 08:47 PM   #12
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Generally servers don't run X, and they certainly don't run screensavers - they just burn CPU for no good reason. However, while that may provide crappy response to your users, it won't cause your problem.

I'm not seeing any reason for the bad response time. You wouldn't happen to have an email server running on this system with an open relay?
 
Old 06-12-2007, 09:22 PM   #13
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Not unless it comes that way from the install. I basically installed FC5, copied over by smb.conf file, turned on Samba and let it rip. It is a very vanilla install.

Thanks,
 
Old 06-12-2007, 09:48 PM   #14
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Assuming you have all the maintenance applied, I see no reason for the performance problem you're having. My last suggestion would be to check for a compromised machine (very unlikely):

yum -y install chkrootkit

Then run:

chkrootkit -q -n
 
Old 06-12-2007, 09:56 PM   #15
asidarin
LQ Newbie
 
Registered: Jun 2007
Posts: 11

Original Poster
Rep: Reputation: 0
Code:
root@acsfs ~]# chkrootkit -q -n

/usr/lib/perl5/5.8.8/i386-linux-thread-multi/.packlist

 The tty of the following user process(es) were not found
 in /var/run/utmp !
! RUID          PID TTY    CMD
! root         1977 tty1   /sbin/mingetty tty1
! root         1980 tty2   /sbin/mingetty tty2
! root         1983 tty3   /sbin/mingetty tty3
! root         1986 tty4   /sbin/mingetty tty4
! root         1990 tty6   /sbin/mingetty tty6
! root         5289 tty7   /usr/bin/Xorg :0 -audit 0 -auth /var/gdm/:0.Xauth -nolisten tcp vt7
I don't have any ttys installed on this system other than the regular serial port(s) on the back of the PC.

Thanks,
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Long period of time to install Peter_APIIT Fedora 4 03-25-2007 12:03 AM
FC5 - iptables - slows and stops internet traffic manu55 Linux - Security 4 09-16-2006 07:26 AM
Slackware IP resolution time period problem phoenixx Linux - Distributions 5 05-11-2005 07:36 PM
bit rate drops after period of time The_JinJ Linux - Wireless Networking 5 03-21-2005 01:21 PM
Only starting command in a specific time period jeroenvrp Linux - Software 3 07-20-2004 11:18 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 07:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration