LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-08-2011, 03:24 PM   #1
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Rep: Reputation: 17
Troubleshooting server slowdown


Hi there!

Since some days, my Ubuntu Linux Home Server is experiencing extreme slowdown.

Here is what i know:
- Reboot doesn't help
- No unusual dmesg / log output
- Very high (8-22) load average
- High (75%+) CPU "wa" usage
- Very high response time and bad "feel"
- Ive touched nothing to slow it down

I suspect some kind of hardware semi-failure. Can you help me troubleshoot it?

Last edited by DaRkBoDoM; 11-08-2011 at 03:35 PM.
 
Old 11-08-2011, 04:30 PM   #2
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653
High wait (wa) is i/o related, commonly disk but not neccessarily - try starting there, are you seeing high disk utilisation?

With the load peaking at 22 I'm guessing you probably have a lot of processes in the run queue, what sort of services is this server providing?

Last edited by kbp; 11-08-2011 at 04:31 PM.
 
Old 11-08-2011, 04:59 PM   #3
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Original Poster
Rep: Reputation: 17
I've checked disk i/o and noted noting above normal.
Anyway, disk i/o is tremendously slow and simply touching a file may take a huge amount of time.

Disks are SATA on RAID1. they are not seek-error messages on dmesg or smart failures.

I've seen a lot of broken disks, but those are strangely "silent".
 
Old 11-08-2011, 05:05 PM   #4
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653
Quote:
Originally Posted by DaRkBoDoM View Post
Anyway, disk i/o is tremendously slow and simply touching a file may take a huge amount of time.
This isn't normal behaviour ...
 
Old 11-08-2011, 05:33 PM   #5
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Original Poster
Rep: Reputation: 17
Yep, I know... but what I could do?
How can I detect what's wrong?

I have no error messages and plugging out random hard driver and "see what happens" it doesn't sound like a reasonable idea.
 
Old 11-08-2011, 06:31 PM   #6
kbp
Senior Member
 
Registered: Aug 2009
Posts: 3,790

Rep: Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653Reputation: 653
There isn't anything wrong with the drive neccessarily, you may just have too many processes that are waiting on disk access and performing a lot of writes, or you could have several processes all waiting on the same file. Try using ps and lsof to see what files the waiting processes are attempting to access.
 
Old 11-09-2011, 12:42 AM   #7
d3vrandom
Member
 
Registered: Jun 2006
Location: Karachi, Pakistan
Distribution: OpenSUSE, CentOS, Debian
Posts: 59

Rep: Reputation: 9
Actually your problem is entirely related to disk i/o. Linux counts processes waiting on disk access in its CPU load figures. So when you have high i/o wait you will also have high load numbers even though your CPU might very well be idle! I suspect there is something wrong with your RAID array. Was there a disk failure and is the array being rebuilt or something? If the answer is no then you have to identify which process is causing high i/o.

One way to check would be to run top and then sort by the time column in descending order. That should tell you which process has been running for a long time.
 
Old 11-09-2011, 03:25 AM   #8
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Original Poster
Rep: Reputation: 17
The only heavy I/O process is qemu, but it has always been there and it's not consuming so much disk bandwidth.
Even terminating it doesn't change things that much.

Disk I/O is very slow also after forcing an hard reboot: it tooks a lot of time also to replay the filesystem journal, when no processes are running at all.

Here is a top sorted by Time. Note that at the time of this "top" the server is running really FAST -_-'
Code:
top - 10:22:13 up 12:44,  1 user,  load average: 4.02, 4.80, 5.26
Tasks: 190 total,   1 running, 189 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.3%us,  2.3%sy, 17.9%ni, 23.9%id, 55.5%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3090260k total,  2880424k used,   209836k free,   782748k buffers
Swap:  1952700k total,     6184k used,  1946516k free,   728200k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    TIME COMMAND
 3577 valhalla  39  19  355m 181m 1792 S 21.5  6.0 120:02.80 120:02 qemu
 3265 nagios    25   5 39240 5404 2008 S  0.0  0.2   1:00.74   1:00 nagios3
 3378 asterisk -11   0  656m  27m 9552 S  0.3  0.9   0:51.89   0:51 asterisk
 1503 mysql     20   0  177m  41m 3364 S  0.0  1.4   0:42.39   0:42 mysqld
  359 root      20   0     0    0    0 S  0.0  0.0   0:29.36   0:29 md8_raid5
 1493 bind      20   0  149m  46m 1716 S  0.0  1.5   0:20.82   0:20 named
 3296 proxy     20   0 86776  21m 2816 S  0.3  0.7   0:20.04   0:20 squid3
  979 syslog    20   0  129m 1720 1112 S  0.3  0.1   0:16.91   0:16 rsyslogd
 3084 snmp      20   0 47456 3660 1548 S  0.0  0.1   0:10.53   0:10 snmpd
   21 root      20   0     0    0    0 S  0.0  0.0   0:10.21   0:10 kswapd0
   10 root      20   0     0    0    0 S  0.0  0.0   0:09.64   0:09 sync_supers
 3423 fetchmai  20   0 43508 3356 2432 S  0.0  0.1   0:08.29   0:08 fetchmail
 6289 root      20   0     0    0    0 S  0.0  0.0   0:07.12   0:07 jbd2/dm-1-8
 2647 postgres  20   0  108m 1704  484 S  0.0  0.1   0:06.70   0:06 postgres
 2706 postgres  20   0  208m 2332 1248 S  0.0  0.1   0:06.39   0:06 postgres
  283 root      20   0     0    0    0 D  0.0  0.0   0:05.56   0:05 md3_raid1
    3 root      20   0     0    0    0 S  0.0  0.0   0:04.92   0:04 ksoftirqd/0
Code:
root@transylvania:~ 0 1001# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid10]
md8 : active raid5 sdc1[5] sde1[7] sdf1[4] sdd1[6]
      2927845632 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md0 : active raid1 sdb1[0] sda1[1]
      2928576 blocks [2/2] [UU]

md2 : active raid1 sdb5[0] sda5[1]
      4881344 blocks [2/2] [UU]

md4 : active raid1 sdb7[0] sda7[1]
      2928576 blocks [2/2] [UU]

md1 : active raid1 sdb2[1] sda2[0]
      1952704 blocks [2/2] [UU]

md3 : active raid1 sdb6[0] sda6[1]
      24413120 blocks [2/2] [UU]

md6 : active raid1 sdb8[0] sda8[1]
      451146624 blocks [2/2] [UU]

unused devices: <none>
My guess is that wa is high not because I/O requests are raised above normal, but because I/O requests are server at a really slower rate.

The problem is: what could be causing that behaviour?

Last edited by DaRkBoDoM; 11-09-2011 at 03:33 AM.
 
Old 11-10-2011, 05:07 AM   #9
d3vrandom
Member
 
Registered: Jun 2006
Location: Karachi, Pakistan
Distribution: OpenSUSE, CentOS, Debian
Posts: 59

Rep: Reputation: 9
I have an idea. Boot the server off a linux installation/live cd and run the badblocks program on each of the drives. On modern drives it takes about 1.5-2 hours for the read only test. If the program runs really slow you know something is wrong with your drives or the disk controller. If it runs normally but shows that you have bad blocks on your drives then your drives need to be replaced.

BTW you could run badblocks without rebooting your server i.e. from within your currently installed os. But I want you to use a CD in order to rule out the current filesystem as a factor in the slow down.

Last edited by d3vrandom; 11-10-2011 at 05:09 AM.
 
Old 11-10-2011, 05:36 AM   #10
deep27ak
Senior Member
 
Registered: Aug 2011
Location: Bangalore, India
Distribution: RHEL 7.x, SLES 11 SP2/3/4
Posts: 1,195
Blog Entries: 4

Rep: Reputation: 221Reputation: 221Reputation: 221
Quote:
Originally Posted by DaRkBoDoM View Post
Hi there!

Since some days, my Ubuntu Linux Home Server is experiencing extreme slowdown.

Here is what i know:
- Reboot doesn't help
- No unusual dmesg / log output
- Very high (8-22) load average
- High (75%+) CPU "wa" usage
- Very high response time and bad "feel"
- Ive touched nothing to slow it down

I suspect some kind of hardware semi-failure. Can you help me troubleshoot it?
Would you mind telling me the RAM and swap memory of your system

Code:
#free -m
post the output
Code:
#df -h
(post the output)
 
Old 11-10-2011, 06:06 AM   #11
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Original Poster
Rep: Reputation: 17
Quote:
Originally Posted by d3vrandom View Post
I have an idea. Boot the server off a linux installation/live cd and run the badblocks program on each of the drives. On modern drives it takes about 1.5-2 hours for the read only test. If the program runs really slow you know something is wrong with your drives or the disk controller. If it runs normally but shows that you have bad blocks on your drives then your drives need to be replaced.
Nice idea. I'll do it tonight. Ty

Quote:
Would you mind telling me the RAM and swap memory of your system
Code:
root@transylvania:~ 0 1001# free -m
             total       used       free     shared    buffers     cached
Mem:          3017       2748        269          0        696        784
-/+ buffers/cache:       1267       1750
Swap:         1906         12       1894
Code:
root@transylvania:~ 0 1002# df -h
File system            Dim. Usati Disp. Uso% Montato su
/dev/md0              2,8G  1,4G  1,3G  52% /
udev                  1,5G  8,0K  1,5G   1% /dev
tmpfs                 604M  1,2M  603M   1% /run
none                  5,0M  8,0K  5,0M   1% /run/lock
none                  1,5G     0  1,5G   0% /run/shm
/dev/md2              4,6G  1,3G  3,1G  30% /usr
/dev/md6              431G  171G  261G  40% /home
/dev/md3               24G   17G  7,0G  70% /var
/dev/md4              2,8G  833M  1,8G  32% /var/log
/dev/mapper/extras-extra2
                      2,8T  1,7T  1,1T  60% /mnt/extra
 
Old 11-10-2011, 06:21 AM   #12
deep27ak
Senior Member
 
Registered: Aug 2011
Location: Bangalore, India
Distribution: RHEL 7.x, SLES 11 SP2/3/4
Posts: 1,195
Blog Entries: 4

Rep: Reputation: 221Reputation: 221Reputation: 221
most of your swap memory is utilized and with such large harddisk size I would advise you to increase swap memory or RAM
 
Old 11-10-2011, 06:35 AM   #13
DaRkBoDoM
Member
 
Registered: Oct 2007
Location: Italy
Distribution: Ubuntu
Posts: 84

Original Poster
Rep: Reputation: 17
Quote:
Originally Posted by deep27ak View Post
most of your swap memory is utilized and with such large harddisk size I would advise you to increase swap memory or RAM
It seems to me that system is almost not swapping at all (about 0,6% swap used).
Where I'm wrong?
 
Old 11-10-2011, 06:57 AM   #14
deep27ak
Senior Member
 
Registered: Aug 2011
Location: Bangalore, India
Distribution: RHEL 7.x, SLES 11 SP2/3/4
Posts: 1,195
Blog Entries: 4

Rep: Reputation: 221Reputation: 221Reputation: 221
Quote:
Originally Posted by DaRkBoDoM View Post
It seems to me that system is almost not swapping at all (about 0,6% swap used).
Where I'm wrong?
well from 3 GB RAM 2.7GB is used and you are running a system with more than 2.8T size?
 
Old 11-10-2011, 07:07 AM   #15
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Quote:
Originally Posted by DaRkBoDoM View Post
Since some days, my Ubuntu Linux Home Server is experiencing extreme slowdown.
Maybe we could try looking at other stuff in parallel?
- I noticed you saying "since some days". So what happened since the machine last ran OK? Any system updates or reconfiguration? New users? Anything else we should know?
- Can you install Atop, reboot the machine to a sane state and have Atop store system- and process activity for at least 24 hours? (I like Atop because it's easy to replay the binary log given a reasonable interval is used.)
- You stated logs don't show any anomalies but you didn't say what you've looked with. If it was a case of cursory visual inspection I suggest using Logwatch instead. It's helpful for finding leads you might have overlooked in log files it knows about.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to troubleshoot sudden server slowdown edan Linux - Server 11 12-11-2011 02:08 PM
Apache Server troubleshooting lorewap3 Linux - Server 2 05-12-2011 08:50 AM
server slowdown torrent478 Linux - Server 4 04-30-2008 12:44 PM
Troubleshooting A Gateway Server enderjm Linux - Networking 3 08-17-2005 11:26 PM
Server down, troubleshooting help needed longnshortofit Red Hat 4 03-19-2005 06:36 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 11:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration