LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 06-26-2014, 10:42 AM   #1
Toasterman
Member
 
Registered: Oct 2013
Posts: 77

Rep: Reputation: Disabled
Backup server extremely slow


I use BackupPC to create backups of all servers at work. What once took an hour to make a complete backup now takes twelve hours or more. I've already determined through tests that the memory and hard drives are OK. It's super delayed even working through the command line.

I thought swap space was the issue first but then when I created a swap file, it didn't help much.

Output of free:
Code:
             total       used       free     shared    buffers     cached
Mem:       3943324    3375460     567864          0    1576088     918512
-/+ buffers/cache:     880860    3062464
Swap:      4194300          8    4194292
Here's my output of top:

Code:
top - 11:22:17 up 18:27,  1 user,  load average: 12.69, 14.53, 15.36
Tasks: 139 total,   1 running, 136 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 68.1%id, 31.9%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3943324k total,  3373816k used,   569508k free,  1575144k buffers
Swap:  4194300k total,        8k used,  4194292k free,   917732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 9781 root      20   0 19280 1332  964 R    0  0.0   0:00.37 top
    1 root      20   0 23640 1636 1060 S    0  0.0   0:00.94 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:00.52 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:01.19 migration/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:00.18 ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/1
    9 root      RT   0     0    0    0 S    0  0.0   0:00.05 migration/2
   10 root      20   0     0    0    0 S    0  0.0   0:00.48 ksoftirqd/2
   11 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/2
   12 root      RT   0     0    0    0 S    0  0.0   0:00.13 migration/3
   13 root      20   0     0    0    0 S    0  0.0   0:01.89 ksoftirqd/3
   14 root      RT   0     0    0    0 S    0  0.0   0:00.00 watchdog/3
   15 root      20   0     0    0    0 S    0  0.0   0:00.69 events/0
   16 root      20   0     0    0    0 S    0  0.0   0:03.86 events/1
   17 root      20   0     0    0    0 S    0  0.0   0:00.95 events/2
   18 root      20   0     0    0    0 S    0  0.0   0:03.01 events/3
   19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
   20 root      20   0     0    0    0 S    0  0.0   0:00.00 khelper
   21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
   25 root      20   0     0    0    0 S    0  0.0   0:00.10 sync_supers
   26 root      20   0     0    0    0 S    0  0.0   0:00.13 bdi-default
   27 root      20   0     0    0    0 S    0  0.0   0:00.00 kintegrityd/0
   28 root      20   0     0    0    0 S    0  0.0   0:00.00 kintegrityd/1
Thanks in advance.

Last edited by Toasterman; 06-26-2014 at 10:44 AM.
 
Old 06-26-2014, 11:04 AM   #2
notsure
Member
 
Registered: Jun 2012
Location: Detroit
Distribution: Arch x86_64
Posts: 112

Rep: Reputation: 10
Quote:
31.9%wa
Check your disk IO with iotop.
 
Old 06-27-2014, 04:22 AM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Note to mention
Quote:
load average: 12.69, 14.53, 15.36
With the above, sure looks like an underperforming disk system.
 
Old 06-27-2014, 08:56 AM   #4
Toasterman
Member
 
Registered: Oct 2013
Posts: 77

Original Poster
Rep: Reputation: Disabled
Screenshot is attached

Quote:
With the above, sure looks like an underperforming disk system.
The setup is software RAID1. Do you have any tips on improving it? After startup one time it was able to do a backup of one server but then went slow again.

Thanks in advance.
Our manager was laid off and I'm not a Linux expert by any means, but I can find my way around.
Attached Thumbnails
Click image for larger version

Name:	iotop.png
Views:	20
Size:	32.4 KB
ID:	15797  

Last edited by Toasterman; 06-27-2014 at 08:58 AM.
 
Old 06-27-2014, 09:18 AM   #5
notsure
Member
 
Registered: Jun 2012
Location: Detroit
Distribution: Arch x86_64
Posts: 112

Rep: Reputation: 10
It's hanging on a job. Which job? Log into the web interface, find out what job is still running and look at the error logs for that host.

--edit-- on second thought, since you can't see the full command, the only remedy I can think of is to log iotop to a file to see the full command. The command will show the host name which you can lookup in the web interface. I'm sure someone smarter than me has a better solution.
Code:
$ iotop -bo > /tmp/file

Last edited by notsure; 06-27-2014 at 09:28 AM.
 
Old 07-01-2014, 01:20 PM   #6
Toasterman
Member
 
Registered: Oct 2013
Posts: 77

Original Poster
Rep: Reputation: Disabled
The problem is that it's slow even when it's not running anything. Starting up the machine in of itself takes 15 minutes or more.

EDIT:
The file is attached. I also upgraded the RAM to 8GB but it doesn't seem to make a difference.
Attached Files
File Type: txt file.txt (50.3 KB, 12 views)

Last edited by Toasterman; 07-01-2014 at 01:40 PM.
 
Old 07-01-2014, 03:07 PM   #7
yo8rxp
Member
 
Registered: Jul 2009
Location: Romania
Distribution: Ubuntu 10.04 Gnome 2
Posts: 102

Rep: Reputation: 31
create a test folder
run some
Code:
dd if=/dev/zero of=path_to_that_folder/test.iso bs=1M count=100
dd if=/dev/zero of=path_to_that_folder/test.iso bs=1024 count=1000
dd if=/dev/zero of=path_to_that_folder/test.iso bs=2048 count=1000
dd if=/dev/zero of=path_to_that_folder/test.iso bs=4096 count=1000
please expose here the results

make sure you aint dd to your md0 or sdX , only towards that folder in order not to destroy raid data

15 min boot time is simply not allowed ! mine is 6 seconds on SSD drive and 12 secs on sata 2 (since grub)
on 2 blue caviar 1 TB raid 1 should be about 270 MB/s on first test


have you got raid 10 (two raid 1 grouped in raid 0 ) ? even so , resync should be longer if raid is dammaged , but not affecting overall boot time
what hdd-s you use ?
please post here
Code:
cat /proc/mdstat

as syg00 said , some is simply not right there
make sure sata cables are in good conditions , i ve seen nasty boot times and errors just from some 1 buck damn faulty cable , raid 1 is more prone to this kinda faults coz data is stripped along

Last edited by yo8rxp; 07-01-2014 at 03:24 PM.
 
Old 07-01-2014, 05:56 PM   #8
notsure
Member
 
Registered: Jun 2012
Location: Detroit
Distribution: Arch x86_64
Posts: 112

Rep: Reputation: 10
Quote:
Originally Posted by Toasterman View Post
The problem is that it's slow even when it's not running anything. Starting up the machine in of itself takes 15 minutes or more.

EDIT:
The file is attached. I also upgraded the RAM to 8GB but it doesn't seem to make a difference.
The log you uploaded indicates an "underperforming disk system". Probably a failed drive.

Is this server using hardware or software RAID? What RAID level?

If it is software RAID:

Code:
cat /proc/mdstat
and run
Code:
smartctl -a /dev/sdx
on all your individual drives.

If it is hardware RAID, what RAID controller? Reboot and enter the BIOS of the RAID card to check on the array and the drives in it.
 
Old 07-02-2014, 08:09 AM   #9
Toasterman
Member
 
Registered: Oct 2013
Posts: 77

Original Poster
Rep: Reputation: Disabled
Quote:
make sure you aint dd to your md0 or sdX , only towards that folder in order not to destroy raid data
Sorry but what is that folder? Am I supposed to create one? Would I be safe if I copied and pasted exactly what you put here? The fact that you said I could destroy RAID data makes me cautious so I want to make sure.

Quote:
Is this server using hardware or software RAID? What RAID level?
Software RAID1

The results of cat /proc/mdstat are:
Quote:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[0] sdb2[1]
1952148416 blocks [2/2] [UU]
[===>.................] resync = 18.7% (366396096/1952148416) finish=2162699.1min speed=12K/sec

md1 : inactive sda3[0] sdb3[1]
2537344 blocks

unused devices: <none>
smartctl -a results:

Code:
root@bcbackup:~# smartctl -a /dev/sda
smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2001FASS-00W2B0
Serial Number:    WD-WMAY00866060
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Jul  2 09:56:38 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (28560) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3037) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   194   194   051    Pre-fail  Always       -       310247
  3 Spin_Up_Time            0x0027   203   159   021    Pre-fail  Always       -       11833
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       93
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   068   068   000    Old_age   Always       -       23640
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       92
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   195   195   000    Old_age   Always       -       15041
194 Temperature_Celsius     0x0022   116   100   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       188
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   179   176   000    Old_age   Offline      -       4304

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%     23489         3826633052

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@bcbackup:~# smartctl -a /dev/sdb
smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD2001FASS-00W2B0
Serial Number:    WD-WMAY00919867
Firmware Version: 05.01D05
User Capacity:    2,000,398,934,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Jul  2 10:03:35 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (29700) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3037) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VAL                                                                                                                 UE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   203   157   021    Pre-fail  Always       -       11850
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       93
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   068   068   000    Old_age   Always       -       23653
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       92
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   186   186   000    Old_age   Always       -       44975
194 Temperature_Celsius     0x0022   119   102   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Thanks all.
I'm thinking of getting two new hard drives and starting over. Would this be ideal?

Last edited by Toasterman; 07-02-2014 at 10:18 AM.
 
Old 07-02-2014, 01:13 PM   #10
yo8rxp
Member
 
Registered: Jul 2009
Location: Romania
Distribution: Ubuntu 10.04 Gnome 2
Posts: 102

Rep: Reputation: 31
Hello sir !
yes , mkdir test folder as I specified
As long you run dd on a file but not an /dev/sdX it cannot distroy any data .. but ......

well , do not run any tests specified above ! it aint worth it !
Sync raid 1 says 12K/sec .. it cannot be true ! this resync speed cannot be allowed even on a 286 old time computer.

simply replace sata2 cables , and run the numbers again.. if computer got dusty then disasembly all parts inside and clean them properly ..
i do repeat myself .. smart says it is ok but sync speed is not tolerable .

1. Smart is ok on both drives
2. sync speed is not ok by any means
3.boot time is 15 mins or so...

overall , main (not spare) drive is faulting somehow
seek time , read speed on main drive sucks big time that is why sync take ages and boot time also
Combined with SMART ok results , the problem is even a faulty sata cable , or MAYBE .. motherboard can be the issue here

Sometimes , if power drain is 2 high , power source can be the culprit , but on a server built upon a desktop PC , except a powerfull video card , everything else suppose to drain just about 150 W - 170 W , not more. ( CPU about 100 - 125 W , HDD just 8W each , Mobo , and some other devices )
Just for benchmarking sake , my raid 1 (2 WD caviar blue 1 TB ) resync at 137 Mb/s in just 2 hours , not 12 K/s like your mdstat report.

Replace all sata cables and hope to hear nothing but good infos here !
Sincerely , gabriel@linux-romania.com

Last edited by yo8rxp; 07-02-2014 at 01:16 PM.
 
Old 07-02-2014, 02:47 PM   #11
Toasterman
Member
 
Registered: Oct 2013
Posts: 77

Original Poster
Rep: Reputation: Disabled
The drives were in a front hot-swappable bay. I removed them from there and connected them directly to the motherboard. No improvement
I then put the hard drives in another computer to see if the motherboard was the problem. No improvement

So it has something to do with the drives after all.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
IBM T42 "Extremely, EXTREMELY Slow" alwayslearning Linux - Laptop and Netbook 5 10-11-2009 03:34 AM
Extremely slow system J_Szucs Linux - General 3 01-29-2007 11:55 AM
Extremely slow networking on server humbletech99 Linux - Networking 2 11-16-2006 04:44 PM
ProFTPD extremely slow newuser455 Linux - Software 1 08-27-2005 05:31 PM
sdl games are slow extremely slow linksocc Linux - Software 7 01-17-2004 03:53 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 05:53 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration