Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a new OpenSuSE 10.3 (2.6.22.17-0.1-default) server that I configured a RAID-1 root partition on.
The second hard drive of the pair turned out to be faulty (SMART errors), so I replaced it, and added the new drive to the array. Unfortunately, instead of telling mdadm to add sdb2, I accidentally gave it sdb.
So, I failed the drive, repartitioned it with fdisk, and added sdb2 into the array. The array rebuilt with no errors and I thought all was well... until the next reboot.
The system booted successfully, but my RAID volume started in degraded mode because the second partition (sdb2) did not exist in /dev. I am able to get the partitions into /dev using partprobe, which is confusing. Why would the partitions not already be there?
Another person posted the exact same problem in the newbie forum, but the response didn't make much sense to me, and the OP did not indicate if/how his problem was fixed. I would post the link to that thread, but the forum will not let me until I've posted at least once. I'll provide it in a reply.
Could you please help me? Lots of details are below. If you need any others, please ask.
The kernel is seeing the partitions. Excerpt from dmesg:
Code:
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
fdisk also sees the partitions:
Code:
# fdisk -l /dev/sdb
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005e7eb
Device Boot Start End Blocks Id System
/dev/sdb1 1 26 208813+ 83 Linux
/dev/sdb2 27 60801 488175187+ fd Linux raid autodetect
But for some reason, the device nodes are not being created:
Code:
# ls -l /dev/sdb*
brw-r----- 1 root disk 8, 16 May 31 16:28 /dev/sdb
# grep sdb /proc/partitions
8 16 488386584 sdb
And mdadm can't start the array properly: (from boot.msg)
Code:
Creating device nodes with udev
mdadm: failed to add /dev/sdb2 to /dev/md0: No such device or address
mdadm: /dev/md0 has been started with 1 drive (out of 2).
If I use partprobe to refresh the partition table on the live system, I can then add sdb2 to my RAID array and it will resync. This is good until the next reboot, at which point the partition is missing again, and I have to resync all over again.
Okay, I'm in much the same boat as you.
The difference is I'm doing raid 6 on 8 drives.
I built the system with 6, everything seemed good, I added two more and I expanded on to those. All seemed good. I don't *think* I stuffed up the last two and used the disks rather than the partitions.
Ah, to recap. Failure of seeing two of the partitions in /dev
fdisking the drives shows that the partitions are there though, and looking all linux raidlike and just like the other visible six.
/proc/partitions does not list the two partitions.
Here's some interesting new information.
I was able to work around the failure by messing around with the mdadm.conf.
If I listed the devices as just being the partitions, the above failure was occurring - ie, two partitions go AWOL, mdadm can't start.
If I listed the devices as /dev/sda1,/dev/sdb1, etc, then mdadm would take an awfully long time to start, as would Ubuntu (7.10). When I'd log in, mdadm was not starting the array, and it was also consuming memory. Left unattended it would shortly bring the system to its knees by consuming everything. However, I did have all 8 partitions listed (not including the 9th drive with the system partition/swap/whatever). So if I killed the process, and I then changed the config back to use partitions for devices, then mdadm could happily start. Of course, if I rebooted, then I'm back to the starting scenario where it would fail to start because the two partitions were AWOL. So my workaround was to start in configuration 2, then kill mdadm, change config to configuration 1, restart mdadm, change config to configuration 2 for next reboot.
So, that was my workaround, but I wasn't overly happy about it.
So recently (about now), after several months of this, I decided to get more hardcore about finding out what was going on. Looking in the syslog, I had a lot of messages about array md1 already having disks. A hell of a lot. Eventually it would stop and it would start doing some sort of mdadm unbind.
I'm a little unsure as to whether these last two partitions are somehow tagged a little differently than the others, or whether mdadm was getting enthused about starting the array a little too early.
Currently I think I've done *something* (sigh) because it's not quite acting the same and I'm actually having a little trouble starting the array at all.
Just a moment ago I had it starting up degraded (minus the two disks - it is raid 6) but now I might have wandered a little, for it's not even doing that, but complaining about a missing md1 superblock. Oh, and all my devices seem to have shuffled along a bit (mdadm -E /dev/sda1 tells me that it's sdb1, with /dev/sdf1 being blank).
I never thought about partprobing. Thanks.
So by partprobing and getting the sdg1 and sdh1 back into /dev and partitions, I am able to start mdadm fine with 8 disks, no apparent issues...
But of course the issue is still not resolved, I've just got a workaround again for missing partitions on boot!
Long story short, with the sort of spontaneous 'inspiration' that I've had, combined with lack of knowledge, interesting snippets on random forums, and bits of "it's telling me I'm wrong but I'm pretty sure I'm right", I'm actually a little surprised I haven't lost the entire array yet!
It's probably time to stop tempting fate and get a little more methodical.
Incidentally, one time I logged in fast and dropped to a terminal and saw my two partitions there when I wasn't expecting them there ... and shortly after they were removed. Hmmmmm.
I've the same issue with disappearing /dev/sdb1,2 with combination of Raid1 md0. It started with accidentaly adding /dev/sdb to /dev/md0 instead of /dev/sdb1 (mdadm --manage /dev/md0 --add /dev/sdb instead of --add /dev/sdb1).
In dmesg (kern log)
md: md0 stopped.
md: bind<sdb>
md: could not open unknown-block(8,17).
md: md_import_device returned -6
md: bind<sda1>
md: kicking non-fresh sdb from array!
md: unbind<sdb>
md: export_rdev(sdb)
helped only erasing raid superblock on /dev/sdb (mdadm --misc --zero-superblock /dev/sdb) reboot and recreate partitions and assign /dev/sdb1 again to the raid.
helped only erasing raid superblock on /dev/sdb (mdadm --misc --zero-superblock /dev/sdb) reboot and recreate partitions and assign /dev/sdb1 again to the raid.
Do you mean that this fixed the problem for you? I "solved" my problem by swapping my sdb with a spare drive.
Do you mean that this fixed the problem for you? I "solved" my problem by swapping my sdb with a spare drive.
I had the same issue.
I also solved mine the same way - I zeroed the superblock from the device (well, two devices, since I did it to two on a raid 6 array).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.