Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a problem with our OCFS2 cluster, which I couldn't solve by myself.
In short, I have a OCFS2 cluster with 3 nodes and a shared storage LUN. I have mapped the LUN to all 3 of the nodes, and split the LUN into 2 partitions, formatted them as OCFS2 filesystems and mounted them successfully. The system has been running OK for nearly 2 years, but today the partition 1 suddenly is not accessible. I have to reboot 1 node. After rebooting, the partition 2 is mounted OK, but the partition 1 cannot be mounted.
The error is below:
Code:
# mount -t ocfs2 /dev/mapper/mpath3p1 /test
mount.ocfs2: Bad magic number in inode while trying to determine heartbeat information
# fsck.ocfs2 /dev/mapper/mpath3p1
fsck.ocfs2 1.6.3
fsck.ocfs2: Bad magic number in inode while initializing the DLM
# fsck.ocfs2 -r 2 /dev/mapper/mpath3p1
fsck.ocfs2 1.6.3
[RECOVER_BACKUP_SUPERBLOCK] Recover superblock information from backup block#1048576? <n> y
fsck.ocfs2: Bad magic number in inode while initializing the DLM
# parted /dev/mapper/mpath3
GNU Parted 1.8.1
Using /dev/mapper/mpath3
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Linux device-mapper (dm)
Disk /dev/mapper/mpath3: 20.0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 10.2TB 10.2TB primary
2 10.2TB 20.0TB 9749GB primary
Usually, the bad magic number happens when the super block is corrupted, and I have experienced several similar cases before, which can be solved quickly by using backup super blocks. But this case is different, I cannot fix the problem by simply replacing the super block, thus I'm out of ideas.
Please take a look and suggest me how to solve this problem, as I need to recover the data, it's the most important goal now.
/dev/mapper/mpath3 implies you're using Linux multipathing with "friendly names".
If you run "multipath -l -v2" it should show you the component disks of that multipath device. Have you checked one or more of those components?
What is/are the underlying component disks? Typically this would be a disk array with multiple paths (usually fiber channel SCSI but maybe iSCSI or something else). Is the underlying disk array OK? Is the underlying component a LUN (e.g. RAID1, RAID5 etc...) in a disk array. If so what is the status of that LUN?
/dev/mapper/mpath3 implies you're using Linux multipathing with "friendly names".
If you run "multipath -l -v2" it should show you the component disks of that multipath device. Have you checked one or more of those components?
What is/are the underlying component disks? Typically this would be a disk array with multiple paths (usually fiber channel SCSI but maybe iSCSI or something else). Is the underlying disk array OK? Is the underlying component a LUN (e.g. RAID1, RAID5 etc...) in a disk array. If so what is the status of that LUN?
Hi MensaWater,
Thanks for your reply.
I have checked the underlying storage first hand. I have an FC storage with one 20 TB LUN, which is accessed by the server via 4 paths (2 FC port of the server HBAs and 2 FC port / storage controller), all the paths are OK, the storage has no warnings. The LUN is created on top of a RAID5 disk group. The second partition which resides in the same LUN is still OK.
I thinks there's something about OCFS2 filesystem has gone wrong. The error seems to occur in the filesystem level, which affects only 1st partition.
Code:
# parted /dev/mapper/mpath3
GNU Parted 1.8.1
Using /dev/mapper/mpath3
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: Linux device-mapper (dm)
Disk /dev/mapper/mpath3: 20.0TB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 17.4kB 10.2TB 10.2TB primary
2 10.2TB 20.0TB 9749GB primary
I think were it me I'd do a web search for:
"fsck.ocfs2: Bad magic number in inode while initializing the DLM"
There seem to be a fair number of hits for that.
I haven't seen this error and we haven't done OCFS2 in a while (we use ASM these days). One thought that occurred to me though was generally speaking you can't do fsck on mounted filesystems. Since this is a cluster and I'm assuming you have this mounted on your other 2 servers I wonder if it simply isn't preventing the fsck due to the other mounts. I don't know if fsck.ocfs2 allows for fsck when mounted on other nodes or not.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.