LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 12-04-2009, 09:10 AM   #1
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Rep: Reputation: 0
linux sees hw raid disks in wrong order whilst grub sees correctly


Dear fellows,

A linux box equipped with LSI MegaRAID SAS1078 controler has 4 SAS disks slots.

Slots 1+2 holds disks that have been configured (using the LSI bios interface at boot time) as raid-mirror array (hence exported by the controler as one unique LUN).

Slots 3 & 4 each holds disks that are not configured as RAID ("direct attached devices" or "raw devices").

The "scan order" numbers for each disk, as reported by the controler interface, are in the same order than the "slot numbers".

Grub (installed on MBR of slot 1+2 LUN) sees:
* slot 1+2 LUN as hd0
* slot 3 disk as hd1
* slot 4 disk as hd2

The Grub boot command says:
root (hd0,1)
kernel /vmlinuz-2.6.16 root=/dev/sda3
initrd /initrd-2.6.16.img

(hd0,0) is swap
(hd0,1) is /boot
(hd0,2) is /

The linux distro is SUSE (SLES10 SP2).

Grub boots correctly (disks in slots 3 & 4 are empty, even no partitions; and as expected, grub tab-auto-completes hd0 with available partitions; on the other hand, grub also sees hd1 and hd2 with the the tab completion, but eventually without partitions behind).

Now the problem:

The kernel loads, initrd as well with its bunch of modules, and then harddrives are detected by the scsi driver and there I see:
...
SCSI device sda: ...
sda: unknown partition table
...
SCSI device sdb: ...
sdb: unknown partition table
...
SCSI device sdc: ...
sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 sdc7 sdc8 sdc9 >
...

Indeed, shortly after, the sysinit fails mounting FS (because FSTAB addresses directly /dev/sdaN and not UUIDs nor UDEV persistent names).

Please note that I DO NOT want to use UUIDs nor LABELs (for personal reasons it's no use to detail here).

I tried to change the disk order at the LSI raid bios level, but:
* this interface cannot change priority for LUNs, only for raw devices (and these raw device priorities are set according to slot numbering)
* Grub indeed sees disks correctly, so I think the problem happens only at the linux stage, not earlier.

I tried to look at UDEV persistent rules (in the /etc as well as in the initrd.img/etc after a cpio extract), but there is nothing human readable (nor any obvious aliases, like those we can find for eth cards for instance).

It seems that linux sees hardware raid arrays after raw disks, though raw disks are in farther physical slots (and with higher ordering number on the controler) and hence orders them incorrectly.

And that's not the behaviour I expect, since I thought the hardware RAID LUNs were seen exactly as raw devices at the linux level, so raid or not should not make any difference in the disks order.

All this story is rigorously reproductible after several boot attempts (i.e. no random discovery here).

Is it related to existing udev rules (default rules from SLES10 SP2) I did not find?

Or to some missing udev rules (explicit aliases) I should have added?

Or to some other udev mechanism I should get rid of (or even get rid of udev completely!)?

Or to something else (not udev related... perhaps the fact I also have LVMs and software raid management...)?

Thanks in advance, any help appreciated

Best regards,

--
Phil

Last edited by wilibird; 12-10-2009 at 09:47 AM.
 
Old 12-05-2009, 02:32 AM   #2
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

why don't you use filesystem labels? So your fstab does not contain device names, but fs labels. Try
Code:
e2fslabel
.
 
0 members found this post helpful.
Old 12-07-2009, 02:19 AM   #3
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Original Poster
Rep: Reputation: 0
Hi,

Thanks for replying, but you didn't answer my question.

Please note that I DO NOT ask for blind workarounds, but for some explanations (or at least clues) of what happens here and how linux detects/orders raid disks.

Any idea?

Cheers,

--
Phil

Last edited by wilibird; 12-07-2009 at 02:48 AM.
 
Old 12-10-2009, 10:07 AM   #4
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Original Poster
Rep: Reputation: 0
Hi again,

after having made several tests, it really seems that linux sees raid arrays AFTER raw disks.

does anybody know why?

and whether this mechanism is tunable in some way?

====================
details:

* with only raw disks, no raid:
==> grub sees disks according to "disks scan order" (as set in the controler bios interface), whatever the slot they are in ...
==> linux sees disks according to "slots order", whatever the controler scan order !!!
==> example:
slot = 0 | 1 | 2 | 3
scan = 3 | 9 | 6 | 10
grub = hd0 | 2 | 1 | 3
linux= sda | b | c | d

* with only raid arrays, no raw:
==> I guess here that disks scan order on the controler doesn't really matter, and I couldn't find any equivalent number for the RAID LUNs, but I'm tempted to think there is a kind of hierarchy within the LUNs (in this case I created the LUN at 0+1 before the one at 2+3... next time, I will try to delete those arrays and recreate them in the other order to see whether this makes any difference)
==> example:
slot = 0+1 | 2+3
scan = 3 9 | 6 10
grub = hd 0 | 1
linux= sd a | b

* with mixed configuration (see first post in this thread):
==> looks like that linux first takes raw disks in their scan order and then RAID arrays.
==> example:
slot = 0+1 | 2 | 3
scan = 3 9 | 6 | 10
grub = hd 0 | 1 | 2
linux= sd c | a | b

any help appreciated,

cheers,

--
Phil
 
Old 12-15-2009, 01:33 AM   #5
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

i know this behavior from other servers. so my advice to use filesystem labels is not a blind work around, it was to only way for me to get it work in the way i like to have it.

Take a google search, in Dell support forums the same solution is provided to users with the same problem.
 
0 members found this post helpful.
Old 12-15-2009, 02:47 AM   #6
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,748
Blog Entries: 11

Rep: Reputation: 233Reputation: 233Reputation: 233
Check on the file /boot/grub/device.map There you have the translation of what grub sees and what it would be inside linux.
Don't count me on the way it works. I'd say its just for grub and the kernel will not take accountance of it.
 
0 members found this post helpful.
Old 12-16-2009, 07:51 AM   #7
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Original Poster
Rep: Reputation: 0
Hi all,

Thanks again for replying.

Sorry if my english is so poor that I can't formulate my question clearly.

I'm not asking about grub nor labels (I CANNOT use labels and you can see throughout my examples and descriptions that grub is not in cause).

Just to make it clear, here is my device.map file :
(hd0) /dev/sda
(hd1) /dev/sdb
(hd2) /dev/sdc
(hd3) /dev/sdd
(hd4) /dev/sde

I'm just asking why those f...... hardware raid units are not seen by linux in the order everyone (I at least...) would expect.

I know UDEV and other kinds of UUIDS/LABELS mechanisms have been designed precisely because sometimes linux does not order devices in the same way from one boot to another, or because some administrators want to keep the linux order unchanged even though the physical layout may change (eth cards for instance).

But that's not the case here. It seems that linux obeys some predictible laws (the order is not the one I expect but at least it seems persistent from one boot to another). And on my administrator side, I won't change the physical layout. So everything is fine, except that I'm just asking whether somebody knows about those predictible laws, because I would like to know them either.

If you had a car with squared wheels and do not understand why it does not roll away, you would not expect somebody to tell you that you could put some oil on the road to make it move forward anyway. You probably would like that somebody explains why only circles may roll. At least, I would (so that I could start thinking about real cars rather than sleighs...).

So here, my goal is not to make my car move forward at all costs, I just would like to know why linux apparently does not detect and order raid units in the same way that raw disks. At which stage does this difference appear? (and why grub does not make such a difference?)

Any idea?

Cheers,

--
Phil

Last edited by wilibird; 12-16-2009 at 09:06 AM.
 
Old 12-17-2009, 04:54 AM   #8
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,748
Blog Entries: 11

Rep: Reputation: 233Reputation: 233Reputation: 233
I can't really answer your question also I think I understand it quite well. And as a non native speaker I must say your english is as good as it needs to be. I normaly understand non native speakers far easier cause they don't have that big of a word space.

Back to topic:
I guess where the mix up between grub, bios and kernel comes from the point where and how they handle hardware and how the define it.
BIOS just sees what there is nothing more or less. And that on a very basic level. "Standard drivers" if you allow this word.
Grub just goes by its mapping.
And the kernel just does it own thing. It does not care what the bios tells it but explorers the world of hardware on its one. Yes it's always beeing adventarous
There might be a startup option to tell the kernel that it should not probe things on it one but take the bios settings for grantage.

And to be honest if something does not work the way one thinks it should but works on a predictable basis what so.
 
0 members found this post helpful.
Old 12-30-2009, 09:42 AM   #9
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by zhjim View Post
There might be a startup option to tell the kernel that it should not probe things on it one but take the bios settings for grantage.
At last some clue... Okay, then, has anybody an idea about which kernel parameters might disable raid arrays rearranging?

Quote:
Originally Posted by zhjim View Post
And to be honest if something does not work the way one thinks it should but works on a predictable basis what so.
The problem is that at this point, my box behavior is "reproducible" but not really "predictable". To be able to predict something, I must have an idea of the law that this thing follows.

If it does not work the way I think it should, then my idea of the law is wrong; here, I know I'm wrong and that's why I am asking whether somebody knows about this f...... law (at least someone MUST know, since it's not GOD that forged this law, nor even a µ$ programmer, but a linux programmer!), so that I am able to predict with more accuracy, and hence do my admin job correctly.

Thanks anyway.

Cheers,
 
Old 12-30-2009, 03:04 PM   #10
phil.d.g
Senior Member
 
Registered: Oct 2004
Posts: 1,272

Rep: Reputation: 154Reputation: 154
The sd labelling scheme doesn't relate to any sort of physical connection, the first device the kernel knows about will get sda regardless of where it is plugged in, the second sdb and so on. I'm not familiar with your controller but could it be that that is publishing raw devices before LUNs?

To be honest what you describe seems quite logical to me, imagine a very basic scenario:
The controller gives an id to each of the raw devices, and then when you create a LUN it will give the next id to that and mark the id's of the physical members as unavailable. Resulting in something like this:

ID 0:0 slot 1*
ID 0:1 slot 2*
ID 0:2 slot 3
ID 0:3 slot 4
ID 0:4 LUN 1

As part of the API of the controller it gives the kernel a list of known devices: ID 0:2, 0:3 and 0:4

As for grub, I'm not sure but I think it uses the boot order in BIOS to determine labels, or perhaps just what is hd0.

The kernel should be unaware if the underlying thing is a RAID volume or a raw device, that is left to the controller so ordering dependent on the type of raw device won't apply here.

Hope this helps, though it'll probably raise more questions than it answers!!
 
1 members found this post helpful.
Old 12-31-2009, 04:42 AM   #11
wilibird
LQ Newbie
 
Registered: Dec 2009
Location: France
Distribution: redhat6, mandrake7, suse5, bits of debian and lots of (x)ubuntu, mandriva2006+, centos5, sles9+
Posts: 9

Original Poster
Rep: Reputation: 0
ok, thank you Phil, your explanation seems logical to me too, except for grub: the (motherboard)bios doesn't even know about raw disks behind the raid controler (but maybe you were talking of the "scan order in the controler bios"?)

somehow you just confirm me in my feeling that I should only have only raw devices or only LUNs, not because of the kernel (you are right, there is probably nothing to do at this stage) but because of the controler API.

I'm just considering to replace my mixed architecture (slot1+2=raid1, slot3=raw, slot4=raw) with a pure raid (slot1+2=raid1, slot3=raid0, slot4=raid0) so that in the end the controler exports only LUNs (hopefully in the good order!).

thank you and happy new year.
 
Old 12-31-2009, 05:46 AM   #12
phil.d.g
Senior Member
 
Registered: Oct 2004
Posts: 1,272

Rep: Reputation: 154Reputation: 154
My example was very simplified, but hopefully enough to illustrate the point, hardware isn't something I pretend to understand.

What I meant with respect to GRUB is that somewhere the device the system boots from is promoted to hd0 in GRUB. This will be why the kernel orders the devices slightly different to GRUB. Whether it is GRUB that does that promotion or something in BIOS or the controller's BIOS I don't know.

If you were to proceed with your proposed setup I would expect the 2 LUNs to be registered by the kernel in the order they were created.
 
Old 01-12-2010, 05:42 PM   #13
zhjim
Senior Member
 
Registered: Oct 2004
Distribution: Debian Squeeze x86_64
Posts: 1,748
Blog Entries: 11

Rep: Reputation: 233Reputation: 233Reputation: 233
This is all but thoughts.
The kernel loads certain drivers before others. Maybe one could change the sequence of modules that get loaded (if not using a monolitic kernel.) Maybe /etc/modprobe or /etc/modprobe.d might have some clues.

I'm no way into kernel just guessing!

P.S.
I just want to have this thread going cause this is an intresting questions to hardware usage by the kernel and might draw some attention by some kernel hackers.
Maybe post this to the kernel mailing list?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Nothing sees my software raid Oxagast Linux - Software 5 01-24-2009 08:18 AM
FC6 sees additional pata disks as busy dbossung Fedora 0 12-22-2006 07:13 AM
CentOS 4.2 sees both disks in RAID array on IBM x346 server SupaDucta Linux - Hardware 0 02-06-2006 02:15 AM
Linux sees 2003 Domain, windows sees Linux ..but.... Stealthy_C Linux - Networking 4 06-14-2005 03:27 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 06:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration