LQ Newbie
Registered: Aug 2008
Posts: 4
Rep:
|
ext3 fs goes ro after a day or three; nfs sharing issues
I have built three SAN partitions on my EMC DMX800 and attached them to a Linux server (Dell PowerEdge 2650, Linux 2.6.9-42.ELsmp #1 SMP) with the intent to share them out via NFS.
One of them is a home directory file system, auto-mounting to other unix systems. The other two are just NFS-shared file systems.
I am continually running into issues with these file systems where they go read-only or (with the home directory) corrupting files. I have fsck'ed these and got them back to usability, only to have them get corrupted or go read-only again.
I have disabled the home directory system so I can concentrate on one of the others which is a critical file system for our network.
Errors I keep seeing in /var/log/messages:
Jul 28 19:30:01 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_journal_start_sb: Detected aborted journal
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 15107529
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 15107530
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 15107531
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 15107533
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_journal_start_sb: Detected aborted journal
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_orphan_del: Journal has aborted
Jul 29 10:25:25 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_truncate: Journal has aborted
Jul 29 12:59:48 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 12:59:48 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:00:04 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 13:00:04 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:00:55 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 13:00:55 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:01:01 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 13:01:01 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:01:39 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 13:01:39 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:03:08 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_check_descriptors: Block bitmap for group 16 not in group (block 33554432)!
Jul 29 13:03:08 kcnfsp01 kernel: EXT3-fs: group descriptors corrupted !
Jul 29 13:13:39 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:13:51 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:21:29 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:38:36 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:43:53 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:44:11 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:44:21 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:45:32 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 13:48:28 kcnfsp01 kernel: EXT3-fs error (device sdd1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 14:00:05 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 29 14:00:05 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 29 14:00:05 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 29 14:02:44 kcnfsp01 kernel: EXT3-fs: recovery complete.
Jul 29 14:02:44 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 29 14:08:31 kcnfsp01 kernel: EXT3-fs: journal inode is deleted.
Jul 29 14:29:53 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 29 22:09:54 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 29 22:09:54 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted journal
Jul 29 22:32:34 kcnfsp01 kernel: EXT3-fs: journal inode is deleted.
Jul 30 00:32:26 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 08:58:16 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 08:59:39 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_free_blocks_sb: bit already cleared for block 50985247
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_truncate: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_orphan_del: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_reserve_inode_write: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1) in ext3_delete_inode: Journal has aborted
Jul 30 09:00:58 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_journal_start_sb: Detected aborted journal
Jul 30 09:03:14 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 09:05:02 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 09:21:39 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Jul 30 09:23:45 kcnfsp01 kernel: EXT3-fs: recovery complete.
Jul 30 09:23:45 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 30 11:13:34 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_new_block: Allocating block in system zone - block = 16547840
Jul 30 11:13:34 kcnfsp01 kernel: EXT3-fs error (device sde1) in ext3_reserve_inode_write: Journal has aborted
Jul 30 11:13:34 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted journal
Jul 30 11:13:34 kcnfsp01 kernel: EXT3-fs error (device sde1) in ext3_ordered_commit_write: Journal has aborted
Jul 30 12:54:57 kcnfsp01 kernel: EXT3-fs warning (device sde1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Jul 30 12:54:57 kcnfsp01 kernel: EXT3-fs warning (device sde1): ext3_clear_journal_err: Marking fs in need of filesystem check.
Jul 30 12:54:57 kcnfsp01 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Jul 30 12:54:57 kcnfsp01 kernel: EXT3-fs: recovery complete.
Jul 30 12:54:57 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 1 12:55:01 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_new_block: Allocating block in system zone - block = 16547841
Aug 1 12:55:01 kcnfsp01 kernel: EXT3-fs error (device sde1) in ext3_reserve_inode_write: Journal has aborted
Aug 1 12:55:01 kcnfsp01 kernel: EXT3-fs error (device sde1): ext3_journal_start_sb: Detected aborted journal
Aug 1 12:55:01 kcnfsp01 kernel: EXT3-fs error (device sde1) in ext3_ordered_commit_write: Journal has aborted
Aug 2 21:00:08 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_readdir: bad entry in directory #25116673: rec_len % 4 != 0 - offset=0, inode=93754411, rec_len=21073, name_len=237
Aug 2 21:00:08 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_readdir: bad entry in directory #8142849: rec_len % 4 != 0 - offset=0, inode=3395865643, rec_len=15878, name_len=180
Aug 3 21:00:05 kcnfsp01 kernel: EXT3-fs error (device sdc1): ext3_readdir: bad entry in directory #2: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
Aug 4 10:23:05 kcnfsp01 kernel: EXT3-fs warning (device sde1): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure
Aug 4 10:23:05 kcnfsp01 kernel: EXT3-fs warning (device sde1): ext3_clear_journal_err: Marking fs in need of filesystem check.
Aug 4 10:23:05 kcnfsp01 kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
Aug 4 10:23:05 kcnfsp01 kernel: EXT3-fs: recovery complete.
Aug 4 10:23:05 kcnfsp01 kernel: EXT3-fs: mounted filesystem with ordered data mode.
/dev/sdc is the home directory file system (big-time errors), sdd and sde are the others. /dev/sde is the one I am working on now, hoping it's resolution will be applicable to the other two as well.
When this happens, I unmount the file system from all hosts, remove it from /etc/exports, run 'exportfs -r' and then unmount it. When I re-mount it (no fsck) it's fine for another day or so, then goes ro again.
I ran fsck on the home directory file system the first time is reported journaling errors, and while it fixed about a zillion errors it also removed journaling, made the fs ext2, and then when I remounted it it was empty. I restored from tape but have since left it offline.
Also, I have also checked with EMC and there are no disk errors on any of the devices in this SAN system. It's used for a few dozen other servers, has been in place for years, and has no other issues.
As all three file systems I have put on this server are showing the same or similar problems, I assume the issue is with the server and not the SAN.
I do also see this issue on boot, which I am not sure is related:
kernel: nfs warning: mount version older than kernel
amd[2614]: mount_nfs_fh: NFS version 3
In addition, NFS services are failing to start on normal reboot, although I have placed the scripts after all other network service scripts in /etc/rc2.d. When I log in after a reboot and start NFS, it starts fine.
I'm about at wits' end. Any suggestions?
|