LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 11-26-2007, 05:31 PM   #1
jdavidow
Member
 
Registered: Sep 2004
Posts: 42

Rep: Reputation: 15
Drive assignments change, RAID5 gets messed up


The problem is that when I reboot my drive assignments change; what was /dev/sda becomes /dev/sdc and so on. GRUB is able to boot, and mdadm is able to start the array... BUT since I have a spare, mdadm sometimes assembles the array thinking that the spare is a degraded member drive, and immediately begins to rebuild on the spare, effectively swapping out the correct drive as the spare.

A better description of the scenario:
(I will use numbers to designate the actual drive IDs, letters to describe their device assignments at boot).

The system worked perfectly with Ubuntu Dapper, but this problem has persisted through upgrades to Edgy, Feisty and now Gutsy.

When I set the machine up:
Code:
DRIVE1: sda,boot drive
DRIVE2-DRIVE6: sd[b-f] are the RAID5 drives (4+1 parity)
DRIVE7: sdg, RAID spare
Created the array with commands like:
Code:
mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[bcdef]1
mdadm --assemble /dev/md0
mdadm --add /dev/md0 /dev/sdg
This worked, and I mounted md0.
Code:
$cat /etc/mdadm/mdadm.conf
DEVICE partitions
ARRAY /dev/md0 level=raid5 num-devices=5 spares=1 UUID=e7356e2b:71e53a26:94b87bc7:e6a9e6b2

$mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Sat Apr  7 23:32:58 2007
     Raid Level : raid5
     Array Size : 48869120 (46.61 GiB 50.04 GB)
  Used Dev Size : 12217280 (11.65 GiB 12.51 GB)
   Raid Devices : 5
  Total Devices : 6
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Nov 26 15:05:07 2007
          State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : e7356e2b:71e53a26:94b87bc7:e6a9e6b2
         Events : 0.2601904

    Number   Major   Minor   RaidDevice State
       0       8       65        0      active sync   /dev/sdb1
       1       8       81        1      active sync   /dev/sdc1
       2       8       17        2      active sync   /dev/sdd1
       3       8        1        3      active sync   /dev/sde1
       4       8       33        4      active sync   /dev/sdf1

       5       8       97        -      spare   /dev/sdg1
So all is well. Then I reboot and GRUB (or whatever is in charge of that) seeming arbitrarily reassigns my drives like this:
Code:
DRIVE1: sdd,boot drive
DRIVE2-DRIVE6: sda,b,e,f,g are the RAID5 drives (4+1 parity)
DRIVE7: sdc, RAID spare
(it looks like the 2nd drive controller was assigned before the first drive controller).

The problem is that mdadm "sees" DRIVE7 (the spare) before some of the other drives in the array, and includes it when it assembles. DRIVE4
ends up as the spare. But since DRIVE7 was the spare, mdadm decides this is a degraded array and immediately begins to rebuild on DRIVE7.

Note, if I were not to have a spare drive, things would work fine, even though the drive assignments are changed.

I can see two cures for this:
1) Prevent the drives from being assigned differently each time. Is there any way to do this?
2) Fix mdadm to recognize UUIDs.

Any ideas?
 
Old 11-26-2007, 05:42 PM   #2
farslayer
LQ Guru
 
Registered: Oct 2005
Location: Northeast Ohio
Distribution: linuxdebian
Posts: 7,249
Blog Entries: 5

Rep: Reputation: 191Reputation: 191
Looks like you will need to write some udev rules to force the drives to come up as the same device each time. if the drives are all identical the rules will have to reference the drive serial numbers..


this may help...
http://www.linuxquestions.org/questi...oddity-563356/
http://www.linuxquestions.org/questi...drives-584595/

Last edited by farslayer; 11-26-2007 at 05:43 PM.
 
Old 11-26-2007, 06:21 PM   #3
jdavidow
Member
 
Registered: Sep 2004
Posts: 42

Original Poster
Rep: Reputation: 15
Thank you! I didn't realize that udev could do this. I read through a lot of the discussions and examples.

Can this force my drives to use specific "/dev/sdX" names, or will it only allow me to create unique names like /dev/DISK_01?

For example
Quote:
KERNEL=="sd*[!0-9]", ENV{ID_SERIAL}=="1ATA_WDC_WD740GD-00FLA0_WD-WMAKE1145287", NAME="sda"
KERNEL=="sd*[0-9]", ENV{ID_SERIAL}=="1ATA_WDC_WD740GD-00FLA0_WD-WMAKE1145287", NAME="sda%n"

KERNEL=="sd*[!0-9]", ENV{ID_SERIAL}=="1ATA_WDC_WD2500JD-75FYB0_WD-WMAEH1924391", NAME="sdb"
KERNEL=="sd*[0-9]", ENV{ID_SERIAL}=="1ATA_WDC_WD2500JD-75FYB0_WD-WMAEH1924391", NAME="sdb%n"
Does this force the first disk to sda and the second to sdb at boot?
 
Old 01-07-2008, 03:21 PM   #4
jdavidow
Member
 
Registered: Sep 2004
Posts: 42

Original Poster
Rep: Reputation: 15
Sorry this took so long to reply...

Doesn't mdadm look at superblocks for info? Isn't the problem here that mdadm encounters the spare drive and assumes it's degraded BEFORE it encounters the actual member drive?

*OR* does mdadm scan through the devices in order /dev/sda, /dev/sdb... and so on?

My mdadm.conf file doesn't mention devices, just UUIDs.
 
Old 01-07-2008, 05:53 PM   #5
jdavidow
Member
 
Registered: Sep 2004
Posts: 42

Original Poster
Rep: Reputation: 15
I am not going to try this. Looking at the logs, md_mod is a kernel module and is loaded before udev runs. It appears that the arrays are assembled well before udev rules are run. It seems to me that md kicks in as soon as the drives are assigned.

From the dmesg excerpt below (clutter removed and line-breaks added), it looks to me like my system is inspecting my 'secondary' drive controller first (three drives), attempting to start the RAID5 on those drives unsuccessfully, then inspecting the main drive controller (MB) and then staring the arrays.

Here's the questions I have...

-Why, if GRUB sees my boot partition as (hd0,0) does the system then inspect the OTHER controller first and end up assigning that hd0 drive to sdd?

-Why does md attempt to start after inspecting the 2nd controller, first binding to sdc and then again to sd[abc]?

-When md attempts to bind again after it has inpected all 7 drives, note that it binds in this order: sd[fbacge]1 and assembles: sd[ecabf] for my first partition (which turns out to be the last-boot configuration), but for the second partition binds in order of sd[fbgcae]2 and assembles sd[ecgbf]2? (Which causes a rebuild of md1)

-Is there any way to run a udev rule as the drive gets detected? That is, at the point in the log where it posts "[XX] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)"

partial dmesg:
Code:
[   39.634924] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[   39.634995] sd 0:0:0:0: [sda] Write Protect is off
[   39.635048] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   39.635076] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   39.635218] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[   39.635292] sd 0:0:0:0: [sda] Write Protect is off
[   39.635350] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   39.635380] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   39.635462]  sda: sda1 sda2
[   39.650092] sd 0:0:0:0: [sda] Attached SCSI disk

[   39.650226] sd 1:0:0:0: [sdb] 490234752 512-byte hardware sectors (251000 MB)
[   39.650296] sd 1:0:0:0: [sdb] Write Protect is off
[   39.650348] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   39.650379] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   39.650505] sd 1:0:0:0: [sdb] 490234752 512-byte hardware sectors (251000 MB)
[   39.650573] sd 1:0:0:0: [sdb] Write Protect is off
[   39.650625] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[   39.650657] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   39.650727]  sdb: sdb1 sdb2
[   39.667599] sd 1:0:0:0: [sdb] Attached SCSI disk

[   39.667719] sd 3:0:0:0: [sdc] 490234752 512-byte hardware sectors (251000 MB)
[   39.667788] sd 3:0:0:0: [sdc] Write Protect is off
[   39.667840] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   39.667871] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   39.667997] sd 3:0:0:0: [sdc] 490234752 512-byte hardware sectors (251000 MB)
[   39.668064] sd 3:0:0:0: [sdc] Write Protect is off
[   39.668116] sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[   39.668146] sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   39.668213]  sdc: sdc1 sdc2
[   39.692703] sd 3:0:0:0: [sdc] Attached SCSI disk

[   39.699348] sd 0:0:0:0: Attached scsi generic sg0 type 0
[   39.699570] sd 1:0:0:0: Attached scsi generic sg1 type 0
[   39.699786] sd 3:0:0:0: Attached scsi generic sg2 type 0

[   39.834560] md: md0 stopped.
[   39.870361] md: bind<sdc1>
[   39.870527] md: md1 stopped.
[   39.910999] md: md0 stopped.
[   39.911064] md: unbind<sdc1>
[   39.911120] md: export_rdev(sdc1)
[   39.929760] md: bind<sda1>
[   39.929953] md: bind<sdc1>
[   39.930139] md: bind<sdb1>
[   39.930231] md: md1 stopped.
[   39.932468] md: bind<sdc2>
[   39.932674] md: bind<sda2>
[   39.932860] md: bind<sdb2>

[   40.880217] sd 6:0:0:0: [sdd] 234441648 512-byte hardware sectors (120034 MB)
[   40.880288] sd 6:0:0:0: [sdd] Write Protect is off
[   40.880340] sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[   40.880367] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.880504] sd 6:0:0:0: [sdd] 234441648 512-byte hardware sectors (120034 MB)
[   40.880572] sd 6:0:0:0: [sdd] Write Protect is off
[   40.880623] sd 6:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[   40.880651] sd 6:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.880718]  sdd: sdd1 sdd2 <<7>ieee1394: Host added: ID:BUS[0-00:1023]  GUID[001485000012704c]
[   40.908264]  sdd5 >
[   40.908479] sd 6:0:0:0: [sdd] Attached SCSI disk
[   40.908579] sd 6:0:0:0: Attached scsi generic sg4 type 0

[   40.908747] scsi 6:0:1:0: Direct-Access     ATA      Maxtor 7L250S0   BACE PQ: 0 ANSI: 5
[   40.908899] sd 6:0:1:0: [sde] 490234752 512-byte hardware sectors (251000 MB)
[   40.908968] sd 6:0:1:0: [sde] Write Protect is off
[   40.909020] sd 6:0:1:0: [sde] Mode Sense: 00 3a 00 00
[   40.909050] sd 6:0:1:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.909174] sd 6:0:1:0: [sde] 490234752 512-byte hardware sectors (251000 MB)
[   40.909243] sd 6:0:1:0: [sde] Write Protect is off
[   40.909294] sd 6:0:1:0: [sde] Mode Sense: 00 3a 00 00
[   40.909324] sd 6:0:1:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.909391]  sde: sde1 sde2
[   40.930218] sd 6:0:1:0: [sde] Attached SCSI disk
[   40.930318] sd 6:0:1:0: Attached scsi generic sg5 type 0

[   40.930480] scsi 7:0:0:0: Direct-Access     ATA      WDC WD2500JD-00H 08.0 PQ: 0 ANSI: 5
[   40.930621] sd 7:0:0:0: [sdf] 488397168 512-byte hardware sectors (250059 MB)
[   40.930689] sd 7:0:0:0: [sdf] Write Protect is off
[   40.930742] sd 7:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[   40.930769] sd 7:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.930894] sd 7:0:0:0: [sdf] 488397168 512-byte hardware sectors (250059 MB)
[   40.930963] sd 7:0:0:0: [sdf] Write Protect is off
[   40.931015] sd 7:0:0:0: [sdf] Mode Sense: 00 3a 00 00
[   40.931044] sd 7:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.931111]  sdf: sdf1 sdf2
[   40.948846] sd 7:0:0:0: [sdf] Attached SCSI disk
[   40.948946] sd 7:0:0:0: Attached scsi generic sg6 type 0

[   40.949106] scsi 7:0:1:0: Direct-Access     ATA      WDC WD2500JS-00M 02.0 PQ: 0 ANSI: 5
[   40.949248] sd 7:0:1:0: [sdg] 488397168 512-byte hardware sectors (250059 MB)
[   40.949317] sd 7:0:1:0: [sdg] Write Protect is off
[   40.949368] sd 7:0:1:0: [sdg] Mode Sense: 00 3a 00 00
[   40.949396] sd 7:0:1:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.949519] sd 7:0:1:0: [sdg] 488397168 512-byte hardware sectors (250059 MB)
[   40.949588] sd 7:0:1:0: [sdg] Write Protect is off
[   40.949640] sd 7:0:1:0: [sdg] Mode Sense: 00 3a 00 00
[   40.949668] sd 7:0:1:0: [sdg] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   40.949734]  sdg: sdg1 sdg2
[   40.969827] sd 7:0:1:0: [sdg] Attached SCSI disk
[   40.969926] sd 7:0:1:0: Attached scsi generic sg7 type 0

[   41.206078] md: md0 stopped.
[   41.206137] md: unbind<sdb1>
[   41.206187] md: export_rdev(sdb1)
[   41.206253] md: unbind<sdc1>
[   41.206302] md: export_rdev(sdc1)
[   41.206360] md: unbind<sda1>
[   41.206408] md: export_rdev(sda1)
[   41.247389] md: bind<sdf1>
[   41.247584] md: bind<sdb1>
[   41.247787] md: bind<sda1>
[   41.247971] md: bind<sdc1>
[   41.248151] md: bind<sdg1>
[   41.248325] md: bind<sde1>
[   41.256718] raid5: device sde1 operational as raid disk 0
[   41.256771] raid5: device sdc1 operational as raid disk 4
[   41.256821] raid5: device sda1 operational as raid disk 3
[   41.256870] raid5: device sdb1 operational as raid disk 2
[   41.256919] raid5: device sdf1 operational as raid disk 1
[   41.257426] raid5: allocated 5245kB for md0
[   41.257476] raid5: raid level 5 set md0 active with 5 out of 5 devices, algorithm 2
[   41.257538] RAID5 conf printout:
[   41.257584]  --- rd:5 wd:5
[   41.257631]  disk 0, o:1, dev:sde1
[   41.257677]  disk 1, o:1, dev:sdf1
[   41.257724]  disk 2, o:1, dev:sdb1
[   41.257771]  disk 3, o:1, dev:sda1
[   41.257817]  disk 4, o:1, dev:sdc1

[   41.257952] md: md1 stopped.
[   41.258009] md: unbind<sdb2>
[   41.258060] md: export_rdev(sdb2)
[   41.258128] md: unbind<sda2>
[   41.258179] md: export_rdev(sda2)
[   41.258248] md: unbind<sdc2>
[   41.258306] md: export_rdev(sdc2)
[   41.283067] md: bind<sdc2>
[   41.283297] md: bind<sda2>
[   41.285235] md: bind<sdb2>
[   41.306753] md: md1 stopped.
[   41.306818] md: unbind<sdb2>
[   41.306878] md: export_rdev(sdb2)
[   41.306956] md: unbind<sda2>
[   41.307007] md: export_rdev(sda2)
[   41.307075] md: unbind<sdc2>
[   41.307130] md: export_rdev(sdc2)
[   41.312250] md: bind<sdf2>
[   41.312476] md: bind<sdb2>
[   41.312711] md: bind<sdg2>
[   41.312922] md: bind<sdc2>
[   41.313138] md: bind<sda2>
[   41.313343] md: bind<sde2>
[   41.313452] md: md1: raid array is not clean -- starting background reconstruction
[   41.322189] raid5: device sde2 operational as raid disk 0
[   41.322243] raid5: device sdc2 operational as raid disk 4
[   41.322292] raid5: device sdg2 operational as raid disk 3
[   41.322342] raid5: device sdb2 operational as raid disk 2
[   41.322391] raid5: device sdf2 operational as raid disk 1
[   41.322823] raid5: allocated 5245kB for md1
[   41.322872] raid5: raid level 5 set md1 active with 5 out of 5 devices, algorithm 2
[   41.322934] RAID5 conf printout:
[   41.322980]  --- rd:5 wd:5
[   41.323026]  disk 0, o:1, dev:sde2
[   41.323073]  disk 1, o:1, dev:sdf2
[   41.323119]  disk 2, o:1, dev:sdb2
[   41.323165]  disk 3, o:1, dev:sdg2
[   41.323212]  disk 4, o:1, dev:sdc2

[   41.323316] md: resync of RAID array md1
[   41.323364] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[   41.323415] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[   41.323492] md: using 128k window, over a total of 231978496 blocks.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Change layout of file and port record assignments jviola Programming 6 04-26-2007 01:33 PM
I want to boot from a SATA Hardware Raid5 drive w/ Debian Sarge bpage25 Linux - Hardware 3 03-21-2007 05:32 PM
how can I change a messed up the display resolution bsaghari SUSE / openSUSE 4 09-21-2006 10:07 PM
CUPS messed up after hardware change yaacobms Linux - Newbie 2 07-14-2004 04:47 AM
change with monitor settings messed up boot centr0 Linux - Newbie 2 02-26-2003 09:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration