LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 06-02-2008, 09:08 PM   #1
KingPong
LQ Newbie
 
Registered: Jun 2008
Location: Atlanta, GA
Posts: 3

Rep: Reputation: 0
Partitions missing on startup after mdadm snafu


Hello,

I have a new OpenSuSE 10.3 (2.6.22.17-0.1-default) server that I configured a RAID-1 root partition on.

The second hard drive of the pair turned out to be faulty (SMART errors), so I replaced it, and added the new drive to the array. Unfortunately, instead of telling mdadm to add sdb2, I accidentally gave it sdb.

So, I failed the drive, repartitioned it with fdisk, and added sdb2 into the array. The array rebuilt with no errors and I thought all was well... until the next reboot.

The system booted successfully, but my RAID volume started in degraded mode because the second partition (sdb2) did not exist in /dev. I am able to get the partitions into /dev using partprobe, which is confusing. Why would the partitions not already be there?

Another person posted the exact same problem in the newbie forum, but the response didn't make much sense to me, and the OP did not indicate if/how his problem was fixed. I would post the link to that thread, but the forum will not let me until I've posted at least once. I'll provide it in a reply.

Could you please help me? Lots of details are below. If you need any others, please ask.

The kernel is seeing the partitions. Excerpt from dmesg:
Code:
sd 1:0:0:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sdb: sdb1 sdb2
sd 1:0:0:0: [sdb] Attached SCSI disk
fdisk also sees the partitions:
Code:
# fdisk -l /dev/sdb

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005e7eb

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1          26      208813+  83  Linux
/dev/sdb2              27       60801   488175187+  fd  Linux raid autodetect
But for some reason, the device nodes are not being created:
Code:
# ls -l /dev/sdb*
brw-r----- 1 root disk 8, 16 May 31 16:28 /dev/sdb

# grep sdb /proc/partitions
   8    16  488386584 sdb
And mdadm can't start the array properly: (from boot.msg)
Code:
Creating device nodes with udev
mdadm: failed to add /dev/sdb2 to /dev/md0: No such device or address
mdadm: /dev/md0 has been started with 1 drive (out of 2).
If I use partprobe to refresh the partition table on the live system, I can then add sdb2 to my RAID array and it will resync. This is good until the next reboot, at which point the partition is missing again, and I have to resync all over again.

Code:
# partprobe
# mdadm /dev/md0 -a /dev/sdb2
mdadm: re-added /dev/sdb2
# cat /proc/mdstat
Personalities : [raid1] [raid0] [raid6] [raid5] [raid4] [linear] 
md0 : active raid1 sdb2[2] sda2[0]
      488175048 blocks super 1.0 [2/1] [U_]
      [>....................]  recovery =  0.1% (621824/488175048) finish=156.7min speed=51818K/sec
      bitmap: 19/466 pages [76KB], 512KB chunk

unused devices: <none>
lspci:
Code:
# lspci
00:00.0 Host bridge: Intel Corporation 82Q35 Express DRAM Controller (rev 02)
00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 02)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 02)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 02)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 02)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 02)
00:1c.4 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 5 (rev 02)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
00:1f.6 Signal processing controller: Intel Corporation 82801I (ICH9 Family) Thermal Subsystem (rev 02)
0d:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03)
0f:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
11:08.0 IDE interface: Integrated Technology Express, Inc. Unknown device 8213
11:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
11:0c.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
 
Old 06-02-2008, 09:09 PM   #2
KingPong
LQ Newbie
 
Registered: Jun 2008
Location: Atlanta, GA
Posts: 3

Original Poster
Rep: Reputation: 0
Link for the other post:
http://www.linuxquestions.org/questi...nd-dev-622612/
 
Old 06-15-2008, 09:57 AM   #3
willow_of_oz
LQ Newbie
 
Registered: Jun 2008
Posts: 3

Rep: Reputation: 0
Okay, I'm in much the same boat as you.
The difference is I'm doing raid 6 on 8 drives.
I built the system with 6, everything seemed good, I added two more and I expanded on to those. All seemed good. I don't *think* I stuffed up the last two and used the disks rather than the partitions.

Ah, to recap. Failure of seeing two of the partitions in /dev
fdisking the drives shows that the partitions are there though, and looking all linux raidlike and just like the other visible six.
/proc/partitions does not list the two partitions.

Here's some interesting new information.
I was able to work around the failure by messing around with the mdadm.conf.
If I listed the devices as just being the partitions, the above failure was occurring - ie, two partitions go AWOL, mdadm can't start.
If I listed the devices as /dev/sda1,/dev/sdb1, etc, then mdadm would take an awfully long time to start, as would Ubuntu (7.10). When I'd log in, mdadm was not starting the array, and it was also consuming memory. Left unattended it would shortly bring the system to its knees by consuming everything. However, I did have all 8 partitions listed (not including the 9th drive with the system partition/swap/whatever). So if I killed the process, and I then changed the config back to use partitions for devices, then mdadm could happily start. Of course, if I rebooted, then I'm back to the starting scenario where it would fail to start because the two partitions were AWOL. So my workaround was to start in configuration 2, then kill mdadm, change config to configuration 1, restart mdadm, change config to configuration 2 for next reboot.

So, that was my workaround, but I wasn't overly happy about it.
So recently (about now), after several months of this, I decided to get more hardcore about finding out what was going on. Looking in the syslog, I had a lot of messages about array md1 already having disks. A hell of a lot. Eventually it would stop and it would start doing some sort of mdadm unbind.
I'm a little unsure as to whether these last two partitions are somehow tagged a little differently than the others, or whether mdadm was getting enthused about starting the array a little too early.
Currently I think I've done *something* (sigh) because it's not quite acting the same and I'm actually having a little trouble starting the array at all.
Just a moment ago I had it starting up degraded (minus the two disks - it is raid 6) but now I might have wandered a little, for it's not even doing that, but complaining about a missing md1 superblock. Oh, and all my devices seem to have shuffled along a bit (mdadm -E /dev/sda1 tells me that it's sdb1, with /dev/sdf1 being blank).

I never thought about partprobing. Thanks.
So by partprobing and getting the sdg1 and sdh1 back into /dev and partitions, I am able to start mdadm fine with 8 disks, no apparent issues...
But of course the issue is still not resolved, I've just got a workaround again for missing partitions on boot!
 
Old 06-15-2008, 10:15 AM   #4
willow_of_oz
LQ Newbie
 
Registered: Jun 2008
Posts: 3

Rep: Reputation: 0
Long story short, with the sort of spontaneous 'inspiration' that I've had, combined with lack of knowledge, interesting snippets on random forums, and bits of "it's telling me I'm wrong but I'm pretty sure I'm right", I'm actually a little surprised I haven't lost the entire array yet!
It's probably time to stop tempting fate and get a little more methodical.

Incidentally, one time I logged in fast and dropped to a terminal and saw my two partitions there when I wasn't expecting them there ... and shortly after they were removed. Hmmmmm.
 
Old 01-30-2009, 03:03 PM   #5
festr
LQ Newbie
 
Registered: Jan 2009
Posts: 4

Rep: Reputation: 0
I've the same issue with disappearing /dev/sdb1,2 with combination of Raid1 md0. It started with accidentaly adding /dev/sdb to /dev/md0 instead of /dev/sdb1 (mdadm --manage /dev/md0 --add /dev/sdb instead of --add /dev/sdb1).

In dmesg (kern log)
md: md0 stopped.
md: bind<sdb>
md: could not open unknown-block(8,17).
md: md_import_device returned -6
md: bind<sda1>
md: kicking non-fresh sdb from array!
md: unbind<sdb>
md: export_rdev(sdb)

helped only erasing raid superblock on /dev/sdb (mdadm --misc --zero-superblock /dev/sdb) reboot and recreate partitions and assign /dev/sdb1 again to the raid.
 
Old 01-30-2009, 09:54 PM   #6
KingPong
LQ Newbie
 
Registered: Jun 2008
Location: Atlanta, GA
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by festr View Post
helped only erasing raid superblock on /dev/sdb (mdadm --misc --zero-superblock /dev/sdb) reboot and recreate partitions and assign /dev/sdb1 again to the raid.
Do you mean that this fixed the problem for you? I "solved" my problem by swapping my sdb with a spare drive.
 
Old 01-31-2009, 04:06 AM   #7
willow_of_oz
LQ Newbie
 
Registered: Jun 2008
Posts: 3

Rep: Reputation: 0
Quote:
Originally Posted by KingPong View Post
Do you mean that this fixed the problem for you? I "solved" my problem by swapping my sdb with a spare drive.
I had the same issue.
I also solved mine the same way - I zeroed the superblock from the device (well, two devices, since I did it to two on a raid 6 array).
 
  


Reply

Tags
mdadm, partition, udev



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
mdadm says "mdadm: /dev/md1 not identified in config file" when booting FC7 raffeD Linux - Server 1 08-11-2008 11:47 AM
mdadm, images over loopback, raid5, can't create partitions emat Linux - Software 1 05-03-2007 03:46 PM
Missing Partitions adamas Linux - Newbie 17 10-31-2006 10:49 AM
missing partitions haddel Linux - Software 0 05-12-2005 05:02 AM
Modem snafu in 2.6.7-ck4 (missing /dev/ttyS0) db391 Linux - Hardware 4 09-09-2004 01:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 12:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration