LinuxQuestions.org - RAID - uninvited array will not go away

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - RAID - uninvited array will not go away (https://www.linuxquestions.org/questions/linux-software-2/raid-uninvited-array-will-not-go-away-725830/)

RAID - uninvited array will not go away

I've been trying to set up a functional, unpartitioned, two-disk RAID-1 array on my Kubuntu 9.04 system. I can get the RAID configured (as /dev/md0) and (apparently) working, but as soon as I re-boot, it breaks.

It breaks because a second, partitioned array shows up, uninvited, and lays claim to one element of my intended RAID.

I have two identical 320 GB HDDs. One of them has seven partitions, and the other has three. Each disk has a 264-GB partition at the end, and it is these two partitions (sda7 and sdb3) that are intended for the array.

Starting from a clean slate, I can set up the RAID as follows:

Code:

mdadm -Cv /dev/md0 -l1 -n2 /dev/sda7 /dev/sdb3

mdadm -As /dev/md0

mount /dev/md0 /datalib

At this point, I can make directories and copy files into the /datalib directory, which is the mount-point for the array.

However, when I reboot and run

Code:

cat /proc/mdstat /dev/md0

the mdstat file tells me that sda7 is (as expected) inactive and associated with md0, and that sda3 is also inactive but is associated with md_d0. That device seems to have arisen spontaneously--I did not create it. That device name appears to designate a partitioned device on a RAID. And when I list the contents of /dev, I find not only the md0 and md_d0 block devices, but an ./md subdirectory and four symlinks pointing to block devices that apparently think they are partitions on this uninvited and unwanted array.

As long as sdb3 is seen as associated with this spurious array, it will not be included in the array I am trying to set up.

Now, I can stop both arrays

Code:

mdadm -S md0

mdadm -S md_d0

then re-run the mdadm create and assemble functions and get the RAID-1 working the way I want it. But as soon as I reboot, it breaks again.

I've tried to get rid of this intruder as follows:

Code:

mdadm --misc --zero-superblock /dev/sdb3

Then I used gParted to remove the sdb3 partition, and rebooted Kubuntu. The unwanted md_d0 and family were gone--hooray! Then I re-installed the sdb3 partition (flagging it for RAID) and rebooted Kubuntu. The md_d0 family were still gone.

I set up my RAID-1 using mdadm's create and assemble--no problem.

Until I rebooted. Then the md_d0 and family were back, and sdb3 was again associated with it.

Any ideas on what's causing this unwanted md to show up, and how I might drive a stake in its heart?

This is a bit of a long shot but have you added your RAID to the mdadm config file?

The following is what I did under Gentoo:

Code:

# mdadm --detail --scan >> /etc/mdadm.conf

Yes, I realized after I posted that long query that I'd neglected to add anything about the mdadm.conf file. I did create one, as follows:

Code:

DEVICE    /dev/sda7 dev/sdb3



ARRAY      /dev/md0 devices=/dev/sda7,dev/sdb3 level=1 auto=md num-devices=2



CREATE    owner=root group=disk mode=660 auto=yes



HOMEHOSE  <system>

The first two lines are my creation; the latter two came with the default configuration file. (I've left out several commented lines in the interest of brevity.)

I've been having the same problem as well (ubuntu 9.04). Even did a wipe of all zeros on my 8-1tb hdds (this is a new raid 6 setup) to see if that would get rid of md_d0 and it did not.
If anyone has any suggestions, or if you figured it out, I would love to find out what you had to do.
Thanks

I'm still struggling with it. For various reasons, I re-formatted the two HDDs and re-installed the OS. Formatted the sda7 and sdb3 partitions as ext3 with the RAID flag set. Then I re-installed mdadm. The automatically-generated mdadm.conf file did not have any ARRAY defined and when I ran

Code:

mdadm -E /dev/sda7

I was told that there was no raid superblock to be found on that device.

I was able to run

Code:

mdadm -Cv /dev/md0 -l1 -n2 /dev/sda7 /dev/sdb3

and get a working RAID up and running. I let it resync, then re-booted, and the md_d0 showed up again.

A question & a long shot suggestion:

Question
What searches, Google Linux & various fora have you done?

I especially want to know how wide spread the problem is & if perchance it is confined to the 'buntu family.

Long shot Suggestion
/dev/sda7 is a logical partition, while /dev/sdb3 is a primary partition -- reformat /dev/sdb3 as an extended partition & give it completely over to a /dev/sdb5 logical partition; then build your array from /dev/sda7 & (the new) /dev/sdb5.

I have no reason to believe this will work, other than a thought that perhaps logical partitions & primary partitions are not compatible in a RAID array; if this is true, then you may have found a bug.

I would like to mark this thread as "RESOLVED" but I don't see how to do that.

At any rate, in response to your question, archtoad6, the only other forum on which I posted the problem was the Ubuntu forum (http://ubuntuforums.org/showthread.php?t=1162600). I also did a lot of searching through Google, but I don't recall exactly what search terms I used nor what pages I found useful to any extent. Sorry I can't be more help on that.

The issue of one partition being primary and one logical occurred to me, too, and I had that on my list of things to try. Before I did that, however, I completely wiped the two disk drives. I reformatted both disks to FAT32, then re-formatted again with the multiple-partition set-up n ext3 file-systems. This was to ensure that no residual data might linger to cause problems when the OS was re-installed. It might have been superfluous, but it didn't take all that long.

Then, with the disks cleaned and re-partitioned, I re-installed the Kubuntu OS and set up the RAID-1 as before, and it worked. I could see a RAID device (md0) with both partitions included on every boot-up. The spurious md_d0 device never reappeared. I don't know what was causing the problem, but the reformat and re-installation appear to have cleared it up. (FWIW, sda7 is still a logical drive and sdb3 is still a primary drive.)

As a side note, I'll add that after I put a line in my fstab file to mount the RAID automatically, I kept getting warning messages every time I'd boot up, telling me that the size of the RAID device described in the superblock and the actual physical size or the array were different. I tried running

Code:

e2fsck -f /dev/md0

to clear it up. but I got error messages to the effect that the superblock was inaccessible (sorry, I didn't record the exact wording) and the command would abort. Eventually, using

Code:

dumpe2fs /dev/md0

I got the locations of alternate superblocks and I added the block location of the last one to the command:

Code:

e2fsck -fb 23887872 /dev/md0

This ran, and it reported the same error, but it gave me the option to abort or continue. When I continued, it eventually reported a specific point of error and offered to fix it. I said "yes" to this and to subsequent similar prompts, and when the command was finished (this was a very lengthy process), the problem was gone.

I never fully resolved the issue with my system, however removing and reinstalling mdadm did seem to help me. Unfortunately, it still would still randomly reboot with md_d0 and I would have to stop it before I started md0.

Somewhat disappointed with all of the manual activity, I pulled the drives and the controllers and threw them into an older pc and installed Openfiler. (This is now my closet server)
Openfiler has had no problems with the drives after it formatted them (even though it uses mdadm as well), and it will shut down and reboot without any of the problems I experienced with Ubuntu 904.

It was my initial interest to have this raid setup directly in my main computer for overall speed. Fortunately, even with the older machine I'm getting around 20MB/s over the network (gigabit) and this has been fast enough for me.

[/QUOTE]

Quote:

Originally Posted by qajaq (Post 3571813)

I would like to mark this thread as "RESOLVED" but I don't see how to do that.

I believe you can edit the title. Also you can tag the OP "solved" & your last post "solution".

Quote:

Originally Posted by qajaq (Post 3571813)

At any rate, in response to your question, archtoad6, the only other forum on which I posted the problem was...

I figured you had done your homework -- I was really interested in how wide spread the problem seemed to you.

Quote:

Originally Posted by qajaq (Post 3571813)

The issue of one partition being primary and one logical occurred to me, too, and I had that on my list of things to try.

Too bad -- I'd like to know if that could have anything to do w/ it; perhaps in conjuntion w/ something else that you cleaned out w/ the reformat.

Quote:

Originally Posted by qajaq (Post 3571813)

I tried running

Code:

e2fsck -f /dev/md0

to clear it up. but I got error messages to the effect that the superblock was inaccessible (sorry, I didn't record the exact wording) and the command would abort. Eventually, using

Code:

dumpe2fs /dev/md0

I got the locations of alternate superblocks

Interesting, I get:

Code:

# e2fsck -f /dev/md0

e2fsck 1.40-WIP (14-Nov-2006)

e2fsck: No such file or directory while trying to open /dev/md0



The superblock could not be read or does not describe a correct ext2

filesystem.  If the device is valid and it really contains an ext2

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

    e2fsck -b 8193 <device>

Note the very bad alternate superblock suggestion, it's only valid w/ 1k blocked file systems; & most FS's these days are big enough to need 4k blocks.

and:

Code:

# dumpe2fs /dev/md0

dumpe2fs 1.40-WIP (14-Nov-2006)

dumpe2fs: No such file or directory while trying to open /dev/md0

Couldn't find valid filesystem superblock.

In all the research I did, I don't recall finding anyone having mentioned that same sort of problem. I pieced together various suggestions that I found for tangential issues -- and, as it turned out, none of them worked until I re-formatted and re-installed the OS.

When you ran the dumpe2fs command and got the "No such file or directory while trying to open /dev/md0. Couldn't find valid filesystem superblock" response, did you have a functioning md0 mounted?

Of course. Oops, I thought I did.

The trouble is that they are no longer named "mdn", they are now "dm-n".

Here are some useful code snippets I just wrote:

Code:

lvmdiskscan | grep -o /dev/dm-.

or more generalized:

Code:

lvmdiskscan | egrep -o '/dev/[dm]{2}-?[0-9]+'

finds them, &

Code:

for X in `lvmdiskscan |grep -o /dev/dm-.`

do echo '======================='

  echo $X

  echo '------------'

  dumpe2fs $X 2>/dev/null  | grep '^Filesystem'

done  | less

gives a short report on each. It just helped me figure out where certain "You need to fsck a file system" boot complaints were coming from. Error msgs. that did not say whichfile system needs fsck.

To find the ones which are generating the error msgs. on boot:

Code:

for X in `lvmdiskscan |grep -o /dev/dm-.`

do echo -n "$X  "

  dumpe2fs $X 2>/dev/null  | grep -o 'needs_recovery' || echo

done  | less