LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 12-05-2005, 02:06 AM   #1
drorex2
LQ Newbie
 
Registered: Dec 2005
Posts: 14

Rep: Reputation: 0
Recovering from possibly botched RAID setup


Ok, this is going to be a long one (lots of logs and stuff). Thanks in advance for reading it, and possibly helping me save my data.

I am installing new disks into my system, and wanted to set them up using RAID.

Here is my old config:

Quote:
/dev/hda: 200GB
/dev/hda1: 20GB (not used)
/dev/hda2: 180GB (mostly full - non essential data)

/dev/hdb: 30GB
/dev/hdb1: 20GB ( root filesystem, mostly full)
/dev/hdb2: 2GB swap
/dev/hdb3: used to be windows install, not used anymore
-------------
Then i installed 2 new 200GB SATA drives. My new config would be:

Quote:
/dev/hda: 200GB
/dev/hda1: 20GB
/dev/hda2: 180GB

/dev/sda: 200GB
/dev/sda1: 180GB
/dev/sda2: 20GB

/dev/sdb: 200GB
/dev/sdb1: 180GB
/dev/sdb2: 20GB

/dev/md0: RAID 1 20GB (extra safe...I can lose 2 of the 3 drives and still boot up ok)
/dev/sda2
/dev/sdb2
/dev/hda1

/dev/md1: RAID5 360GB (I can lose 1 of the 3 drives, and still have all my data, plus it's combined into a single large drive)
/dev/sda1
/dev/sdb1
/dev/hda2
-----------------

Now, in order to migrate my data, here was my plan:

1. install the two new drives
2. create a RAID5 array, md1 from sda1 & sdb1, and 'missing' as the third drive (so that it runs in degraded mode)
3. copy all my data from /dev/hda2 to /dev/md1
4. add /dev/hda2 to /dev/md1, and have it resync the parity

Now this is where I got stuck (haven't gotten to md0 yet). The data seemed to copy fine, and then I unmounted hda2, and then added it to the md1 array. It started resyncing, and got to maybe 10% fine. But after a while, something odd happened. If I cat /proc/mdstat, it would cycle between "Resyncing...0%", then the next time it would say "Resync=DELAYED" then the next time, it wouldn't show it resyncing at all.

And it started generating HUGE amounts of logs in /var/log/syslog and /var/log/messages, about 350 MB before i stopped sysklog & klog (because my disk is already almost full). Here is the end of the log output:
Code:
Dec  4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec  4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec  4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec  4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec  4 23:22:33 drorex kernel: md: md1: sync done.
Dec  4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec  4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec  4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec  4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec  4 23:22:33 drorex kernel: md: md1: sync done.
Dec  4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec  4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec  4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec  4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec  4 23:22:33 drorex kernel: md: md1: sync done.
Dec  4 23:22:33 drorex kernel: ................<6>md: syncing RAID array md1
Dec  4 23:22:33 drorex kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Dec  4 23:22:33 drorex kernel: md: using maximum available idle IO bandwith (but not more than 200000 KB/sec) for reconstruction.
Dec  4 23:22:33 drorex kernel: md: using 128k window, over a total of 175783104 blocks.
Dec  4 23:22:33 drorex kernel: md: md1: sync done.
Dec  4 23:22:33 drorex exiting on signal 15
It's basically the same thing over and over, repeating.

So then I stopped the array, and tried restarting it, but now it says my disks are failed or something?

Code:
root@drorex:~# mdadm --assemble /dev/md1 /dev/sda1 /dev/sdb1 /dev/hda2
mdadm: /dev/md1 assembled from 1 drive and 1 spare - not enough to start the array.
root@drorex:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : inactive sda1[0] hda2[3] sdb1[1]
      527373440 blocks
unused devices: <none>
Here is the output of mdadm --examine for all of the partitions in md1:

Code:
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
  Creation Time : Sun Dec  4 18:27:00 2005
     Raid Level : raid5
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Dec  4 23:24:16 2005
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 1
       Checksum : b33aa022 - correct
         Events : 0.1049267

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       3        2        2      spare   /dev/hda2



/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
  Creation Time : Sun Dec  4 18:27:00 2005
     Raid Level : raid5
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Dec  4 22:57:33 2005
          State : clean
 Active Devices : 2
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 1
       Checksum : b31b8bf7 - correct
         Events : 0.31676

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       17        1      active sync   /dev/sdb1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       0        0        2      faulty removed
   3     3       3        2        2      spare   /dev/hda2



/dev/hda2:
          Magic : a92b4efc
        Version : 00.90.01
           UUID : aef026ea:a658dd3d:d83036ce:4ad342a2
  Creation Time : Sun Dec  4 18:27:00 2005
     Raid Level : raid5
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0

    Update Time : Sun Dec  4 23:24:16 2005
          State : clean
 Active Devices : 1
Working Devices : 2
 Failed Devices : 3
  Spare Devices : 1
       Checksum : b33aa01f - correct
         Events : 0.1049267

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       3        2        3      spare   /dev/hda2

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       3        2        3      spare   /dev/hda2
It looks like something weird is going on, with the different reports of 'faulty' and 'spare'.

Again, the raid array was initially created with /dev/sda1 & /dev/sdb1, then /dev/hda2 was added to it.

Is there someway I can reset the flags so they aren't marked as 'failed'?
I'd really like a way to do this without losing my data, I know it's all in there somewhere.

Thanks again for anyone who can help, I had a bunch of stuff I don't want to lose in there
 
Old 12-05-2005, 12:05 PM   #2
drorex2
LQ Newbie
 
Registered: Dec 2005
Posts: 14

Original Poster
Rep: Reputation: 0
Ok, I took a chance, and ran mdadm --assemble with the --force flag...and it worked! at least it seems to work...hopefully my disks wont explode in a couple hours :P
 
Old 12-05-2005, 12:29 PM   #3
drorex2
LQ Newbie
 
Registered: Dec 2005
Posts: 14

Original Poster
Rep: Reputation: 0
I spoke too soon

The same exact thing happened again...resynced for a little while (15 or 20 mins), then the logs started generating huge amounts of the same msgs as above, and the resync would be stuck in that stopped/delayed/0% state again.

Do you think one of my drives is bad? They aren't new, but they've been sitting on the shelf for a while, and they've either been used not at all, or very little. Is there a way I can test each drive separately to see if one is bad?

thanks
 
Old 12-09-2005, 12:30 AM   #4
drorex2
LQ Newbie
 
Registered: Dec 2005
Posts: 14

Original Poster
Rep: Reputation: 0
still need help, anybody?

I haven't found a solution yet: I'm open to any suggestions, can think of anything I can do, to fix the problem, or to figure out more about it? Thanks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
recovering from a dead raid 5 array aaronj Linux - Software 2 06-03-2005 04:13 AM
Recovering software 5 RAID wesleywest Red Hat 1 02-09-2005 02:35 PM
recovering data from an old RAID -0 dominant Linux - Software 1 01-26-2005 02:42 AM
Recovering RAID-5 array after OS crash IMNOboist Linux - Hardware 0 12-14-2004 12:04 AM
raid mdadm reiser, possibly boot order? DigitalSmash Debian 5 11-10-2004 03:26 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 03:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration