Hi..Sir, Thank you very much for your quick response. as you said I checked
"smartctl -l scterc /dev/sda", please see the the below output as below
[root@testbhim CentOS]# /usr/sbin/smartctl -l scterc /dev/sdd
smartctl version 5.38 [i686-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is
http://smartmontools.sourceforge.net/
=======> INVALID ARGUMENT TO -l: scterc
=======> VALID ARGUMENTS ARE: error, selftest, selective, directory, background, scttemp[sts|hist] <=======
Use smartctl -h to get a usage summary
above output is for all sda, sdb, sdc, sdd
is there any problem if I leave free space in sda & sdb? because still around 130GB free space in each(sda & sdb) hard disk. & that free space is not in raid array.
please see the output of fdisk
[root@testbhim CentOS]# /sbin/fdisk -l /dev/sdd
Disk /dev/sdd: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdd1 1 121601 976760001 fd Linux raid autodetect
[root@testbhim CentOS]# /sbin/fdisk -l /dev/sda
Disk /dev/sda: 1000.2 GB, 1000203804160 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 64 514048+ fd Linux raid autodetect
/dev/sda2 65 6438 51199155 fd Linux raid autodetect
/dev/sda3 6439 7458 8193150 fd Linux raid autodetect
/dev/sda4 7459 121601 916853647+ 5 Extended
/dev/sda5 7459 68247 488287611 fd Linux raid autodetect
/dev/sda6 68248 104721 292977373+ fd Linux raid autodetect
As you said I ran "echo check > /sys/block/md5/md/sync_action", it started to resync md5. But if I scheduled it weekly once for all md device at night, is there any problem while resyncing? because at night backup script for daily backup will be running daily. or if there is no problem, can I schedule it on day time itself?
can we copy large data while rebuilding & re syncing the raid array?
please see the df output
[root@testbhim CentOS]# df -hT
Filesystem Type Size Used Avail Use% Mounted on
/dev/md1 ext3 48G 7.1G 38G 16% /
/dev/md0 ext3 487M 24M 438M 5% /boot
tmpfs tmpfs 945M 0 945M 0% /dev/shm
/dev/md3 ext3 459G 291G 145G 67% /backup
/dev/md5 ext3 917G 23G 848G 3% /repo
md4 was mounted under md5, it has been unmounted now. because I added new sdb, & rebuilding is going on now. because if I run smartctl -H /dev/sdb the status was failed. but sda, sdc, sdd the status was passed. So I replaced sdb with new one.
please have a look with mdstat output
[root@testbhim CentOS]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[1] sda1[0]
513984 blocks [2/2] [UU]
md2 : active raid1 sdb3[2] sda3[0]
8193024 blocks [2/1] [U_]
resync=DELAYED
md3 : active raid1 sdb5[2] sda5[0]
488287488 blocks [2/1] [U_]
[===================>.] recovery = 99.9% (488132736/488287488) finish=0.0min speed=60982K/sec
md4 : active raid1 sdb6[2] sda6[0]
292977280 blocks [2/1] [U_]
resync=DELAYED
md5 : active raid1 sdd1[1] sdc1[0]
976759936 blocks [2/2] [UU]
[================>....] resync = 80.6% (787464128/976759936) finish=36.4min speed=86508K/sec
md1 : active raid1 sdb2[1] sda2[0]
51199040 blocks [2/2] [UU]
Please have a look with fstab
/dev/md1 / ext3 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/md2 swap swap defaults 0 0
/dev/md3 /backup ext3 defaults 0 0
#/dev/md4 /repo/base ext3 defaults 0 0
/dev/md5 /repo ext3 defaults 0 0
please see the mdadm.conf
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=0.90 UUID=57786f03:0a32e8bb:b9fab770:aa72d2d0
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=0.90 UUID=82268afc:1cb20e19:1afcb25e:2cdc61d8
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=0.90 UUID=582b1f12:df2a3d2c:877e2383:d74df3bc
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=32a7b35b:3de5bc15:90e539c8:1d9a30ed
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=4ebc42df:3289c3a4:9a3bf106:d1005657
ARRAY /dev/md0 level=raid1 num-devices=2 metadata=0.90 UUID=cbfaf74f:c313a55f:3bda6e4d:f8e91bad
After rebuilding sdb with sda, again should we overwrite or add output to /etc/mdadm.conf. By issuing following commands or is it okay even I am not adding any output /etc/mdadm.conf?
#mdadm --detail --scan > /etc/mdadm.conf
#mdadm --examine --scan >> /etc/mdadm.conf
etc....?
please see /var/log/messages output
[root@testbhim CentOS]# tail -n 100 /var/log/messages
Jul 8 12:48:30 testbhim kernel: [<c0435f3b>] kthread+0xc0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0435e7b>] kthread+0x0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Jul 8 12:48:30 testbhim kernel: =======================
Jul 8 12:48:30 testbhim kernel: INFO: task md3_resync:4369 blocked for more than 120 seconds.
Jul 8 12:48:30 testbhim kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 12:48:30 testbhim kernel: md3_resync D 000000D0 3440 4369 19 4373 4362 (L-TLB)
Jul 8 12:48:30 testbhim kernel: f2d5bec0 00000046 2c44f342 000000d0 00000064 00000000 00000000 0000000a
Jul 8 12:48:30 testbhim kernel: f7041aa0 2c450af5 000000d0 000017b3 00000001 f7041bac c1f00944 f79bd900
Jul 8 12:48:30 testbhim kernel: 00000000 c1f012e4 f7b7668c f79f55c8 f2d5bf80 f7d20000 c0425e9b c0667726
Jul 8 12:48:30 testbhim kernel: Call Trace:
Jul 8 12:48:30 testbhim kernel: [<c0425e9b>] printk+0x18/0x8e
Jul 8 12:48:30 testbhim kernel: [<c05ab8cf>] md_do_sync+0x1fe/0x966
Jul 8 12:48:30 testbhim kernel: [<c041ee80>] enqueue_task+0x29/0x39
Jul 8 12:48:30 testbhim kernel: [<c041eeda>] __activate_task+0x4a/0x59
Jul 8 12:48:30 testbhim kernel: [<c041f79d>] try_to_wake_up+0x3e8/0x3f2
Jul 8 12:48:30 testbhim kernel: [<c061c770>] schedule+0x9cc/0xa55
Jul 8 12:48:30 testbhim kernel: [<c0435fff>] autoremove_wake_function+0x0/0x2d
Jul 8 12:48:30 testbhim kernel: [<c05ac321>] md_thread+0xdf/0xf5
Jul 8 12:48:30 testbhim kernel: [<c041eb45>] complete+0x2b/0x3d
Jul 8 12:48:30 testbhim kernel: [<c05ac242>] md_thread+0x0/0xf5
Jul 8 12:48:30 testbhim kernel: [<c0435f3b>] kthread+0xc0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0435e7b>] kthread+0x0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Jul 8 12:48:30 testbhim kernel: =======================
Jul 8 12:48:30 testbhim kernel: INFO: task md4_resync:4373 blocked for more than 120 seconds.
Jul 8 12:48:30 testbhim kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 12:48:30 testbhim kernel: md4_resync D 000000D0 3492 4373 19 4369 (L-TLB)
Jul 8 12:48:30 testbhim kernel: f2cc1ec0 00000046 2c44c9de 000000d0 00000064 00000096 f2f50f80 00000009
Jul 8 12:48:30 testbhim kernel: f70bdaa0 2c44f342 000000d0 00002964 00000001 f70bdbac c1f00944 f79bd900
Jul 8 12:48:30 testbhim kernel: 00000003 c06b5b98 f7f7d0cc f7d201c8 f2cc1f80 f79f5600 c0425e9b c0667726
Jul 8 12:48:30 testbhim kernel: Call Trace:
Jul 8 12:48:30 testbhim kernel: [<c0425e9b>] printk+0x18/0x8e
Jul 8 12:48:30 testbhim kernel: [<c05ab8cf>] md_do_sync+0x1fe/0x966
Jul 8 12:48:30 testbhim kernel: [<c041ee80>] enqueue_task+0x29/0x39
Jul 8 12:48:30 testbhim kernel: [<c041eeda>] __activate_task+0x4a/0x59
Jul 8 12:48:30 testbhim kernel: [<c041f79d>] try_to_wake_up+0x3e8/0x3f2
Jul 8 12:48:30 testbhim kernel: [<c061c770>] schedule+0x9cc/0xa55
Jul 8 12:48:30 testbhim kernel: [<c0435fff>] autoremove_wake_function+0x0/0x2d
Jul 8 12:48:30 testbhim kernel: [<c05ac321>] md_thread+0xdf/0xf5
Jul 8 12:48:30 testbhim kernel: [<c041eb45>] complete+0x2b/0x3d
Jul 8 12:48:30 testbhim kernel: [<c05ac242>] md_thread+0x0/0xf5
Jul 8 12:48:30 testbhim kernel: [<c0435f3b>] kthread+0xc0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0435e7b>] kthread+0x0/0xed
Jul 8 12:48:30 testbhim kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Jul 8 12:48:30 testbhim kernel: =======================
Jul 8 12:50:30 testbhim kernel: INFO: task md2_resync:4362 blocked for more than 120 seconds.
Jul 8 12:50:30 testbhim kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 12:50:30 testbhim kernel: md2_resync D 000000D0 3440 4362 19 4369 4359 (L-TLB)
Jul 8 12:50:30 testbhim kernel: f2f50ec0 00000046 2c450af5 000000d0 00000064 00000000 00000000 0000000a
Jul 8 12:50:30 testbhim kernel: f773a550 2c451947 000000d0 00000e52 00000001 f773a65c c1f00944 f79bd900
Jul 8 12:50:30 testbhim kernel: 00000003 c1f012e4 f7b7658c f7d201c8 f2f50f80 f78df200 c0425e9b ffffffff
Jul 8 12:50:30 testbhim kernel: Call Trace:
Jul 8 12:50:30 testbhim kernel: [<c0425e9b>] printk+0x18/0x8e
Jul 8 12:50:30 testbhim kernel: [<c05ab8cf>] md_do_sync+0x1fe/0x966
Jul 8 12:50:30 testbhim kernel: [<c041ee80>] enqueue_task+0x29/0x39
Jul 8 12:50:30 testbhim kernel: [<c041eeda>] __activate_task+0x4a/0x59
Jul 8 12:50:30 testbhim kernel: [<c041f79d>] try_to_wake_up+0x3e8/0x3f2
Jul 8 12:50:30 testbhim kernel: [<c061c770>] schedule+0x9cc/0xa55
Jul 8 12:50:30 testbhim kernel: [<c0435fff>] autoremove_wake_function+0x0/0x2d
Jul 8 12:50:30 testbhim kernel: [<c05ac321>] md_thread+0xdf/0xf5
Jul 8 12:50:30 testbhim kernel: [<c041eb45>] complete+0x2b/0x3d
Jul 8 12:50:30 testbhim kernel: [<c05ac242>] md_thread+0x0/0xf5
Jul 8 12:50:30 testbhim kernel: [<c0435f3b>] kthread+0xc0/0xed
Jul 8 12:50:30 testbhim kernel: [<c0435e7b>] kthread+0x0/0xed
Jul 8 12:50:30 testbhim kernel: [<c0405c53>] kernel_thread_helper+0x7/0x10
Jul 8 12:50:30 testbhim kernel: =======================
Jul 8 12:51:25 testbhim kernel: md: md1: sync done.
Jul 8 12:51:26 testbhim kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Jul 8 12:51:26 testbhim kernel: md: syncing RAID array md3
Jul 8 12:51:26 testbhim kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jul 8 12:51:26 testbhim kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jul 8 12:51:26 testbhim kernel: md: using 128k window, over a total of 488287488 blocks.
Jul 8 12:51:26 testbhim kernel: RAID1 conf printout:
Jul 8 12:51:26 testbhim kernel: --- wd:2 rd:2
Jul 8 12:51:26 testbhim kernel: disk 0, wo:0, o:1, dev:sda2
Jul 8 12:51:26 testbhim kernel: disk 1, wo:0, o:1, dev:sdb2
Jul 8 12:51:26 testbhim kernel: md: delaying resync of md4 until md3 has finished resync (they share one or more physical units)
Jul 8 12:52:51 testbhim kernel: md: syncing RAID array md5
Jul 8 12:52:51 testbhim kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jul 8 12:52:51 testbhim kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jul 8 12:52:51 testbhim kernel: md: using 128k window, over a total of 976759936 blocks.
Jul 8 13:18:28 testbhim scim-bridge: The lockfile is destroied
Jul 8 13:18:28 testbhim scim-bridge: Cleanup, done. Exitting...
Jul 8 14:49:51 testbhim avahi-daemon[3272]: Invalid legacy unicast query packet.
Jul 8 14:49:51 testbhim avahi-daemon[3272]: Received response from host 192.168.0.95 with invalid source port 2673 on interface 'eth0.0'
Jul 8 14:49:51 testbhim avahi-daemon[3272]: Invalid legacy unicast query packet.
Jul 8 14:49:51 testbhim avahi-daemon[3272]: Invalid legacy unicast query packet.
Jul 8 14:49:51 testbhim avahi-daemon[3272]: Received response from host 192.168.0.95 with invalid source port 2673 on interface 'eth0.0'
Jul 8 14:49:55 testbhim last message repeated 4 times
Jul 8 14:58:55 testbhim kernel: md: md3: sync done.
Jul 8 14:58:55 testbhim kernel: md: syncing RAID array md4
Jul 8 14:58:55 testbhim kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Jul 8 14:58:55 testbhim kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Jul 8 14:58:55 testbhim kernel: md: using 128k window, over a total of 292977280 blocks.
Jul 8 14:58:55 testbhim kernel: RAID1 conf printout:
Jul 8 14:58:55 testbhim kernel: --- wd:2 rd:2
Jul 8 14:58:55 testbhim kernel: disk 0, wo:0, o:1, dev:sda5
Jul 8 14:58:55 testbhim kernel: disk 1, wo:0, o:1, dev:sdb5
Jul 8 14:58:55 testbhim kernel: md: delaying resync of md2 until md4 has finished resync (they share one or more physical units)
[root@testbhim CentOS]#
please see the
Quote:
Originally Posted by macemoneta
Check /var/log/messages to see what the problem is. It sounds like you are having some write failures. Make sure you have TLER (time limited error recovery) enabled on the drives:
Code:
smartctl -l scterc /dev/sda
smartctl -l scterc /dev/sdb
If it's disabled, set it to 7 seconds:
Code:
smartctl -l scterc,70,70 /dev/sda
smartctl -l scterc,70,70 /dev/sdb
Some more information on TLER here.
You may also want to force a check on the arrays (and schedule it weekly), to insure they are properly synced:
Code:
echo check > /sys/block/md0/md/sync_action
echo check > /sys/block/md1/md/sync_action
echo check > /sys/block/md2/md/sync_action
echo check > /sys/block/md3/md/sync_action
echo check > /sys/block/md4/md/sync_action
There's more information at the RAID Wiki.
|