LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise
User Name
Password
Linux - Enterprise This forum is for all items relating to using Linux in the Enterprise.

Notices


Reply
  Search this Thread
Old 09-11-2008, 09:45 AM   #1
srinivasvenu
LQ Newbie
 
Registered: Sep 2008
Posts: 2

Rep: Reputation: 0
Journal Aborting and Kernel Panic in RHEL 4 AS


EXT3 file system remounting read only (Journal Aborting)


Hi Admins,
Since 1 year Im facing strange problem with our IBM x3950 server its connected directly through fibre channel to IBM DS400 small storage. Like this similar setup we have 2 same servers (IBM X3950) and storage 2 (IBM DS400). Both are getting same error.

We have the following issue.

When we delete largefiles (4gb each) files from storage file system (/dev/sdc14) becomes READ ONLY. (Aborting Journal)

This incident not occurring every time, frequently monthly once or two months once will occur.

When first time occurred the issue, we contacted the Redhat they said suspect some hardware, firmware issues in server or storage side.

Hardware vendor came onsite and said no error found in the hardware and he updated latest firmware.

Again after one month same issue occured on same server .

Again when we contacted Redhat, they said its kernel bug recommended to update latest kernel. We updated latest kernel.

Again same issue occurred on same server. Again when we contact Redhat saying its kernel bug we will send test kernel update and reproduce the issue.

When intentionally stimulate the problem will not occur the incident.
Our application every day will generate 10 thousand files (reports). Our script every day will create directory (datemonth format) and will move all those files into that directory

OS: Redhat Enterprise Linux 4 AS Kernel 2.6.9-42.0.3
Server: ibm x3950 model
Storage: IBM DS400 Connected through fibre channel (SAN)
Both servers & storage model same


Every time Im repairing file system and following below steps
1. Unmounting the file system (/disk1)
2. Nuke the ext3 journal
tune2fs -O ^has_journal /dev/<rootfs>
3. e2fsck -fy /dev/sdc14
4. Rebuild the journal
tune2fs -j /dev/sdc14
5. Mounting the file system (/disk1)

The following is the error logs in Server 1 (TOMMY)


[root@TOMMY ~]# cat /var/log/messages

MAY 20 11:00:01 TOMMY crond(pam_unix)[27182]: session closed for user root
MAY 20 11:00:05 TOMMY crond(pam_unix)[27164]: session closed for user monitor
MAY 20 11:00:05 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028354
MAY 20 11:00:05 TOMMY kernel: Aborting journal on device sdc14.
MAY 20 11:00:05 TOMMY kernel: ext3_abort called.
MAY 20 11:00:05 TOMMY kernel: EXT3-fs error (device sdc14): ext3_journal_start_sb: Detected aborted journal
MAY 20 11:00:05 TOMMY kernel: Remounting filesystem read-only
MAY 20 11:00:05 TOMMY kernel: EXT3-fs error (device sdc14) in start_transaction: Journal has aborted
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028355
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028356
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028357
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028358
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14): ext3_free_blocks_sb: bit already cleared for block 37028359
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14) in ext3_free_blocks_sb: Journal has aborted
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14) in ext3_reserve_inode_write: Journal has aborted
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14) in ext3_truncate: Journal has aborted
MAY 20 11:00:06 TOMMY kernel: EXT3-fs error (device sdc14) in ext3_reserve_inode_write: Journal has aborted
The following is the error logs from Server 2

[root@TOMMY ~]# dmesg
EXT3-fs error (device sdc14) in start_transaction: Journal has aborted
EXT3-fs error (device sdc14) in start_transaction: Journal has aborted
EXT3-fs error (device sdc14) in start_transaction: Journal has aborted
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
kjournald starting. C


Recently we upgraded REDHAT Provided customized kernel but after that 2 months ok but yester day our system hang due to Kernal Panic. Redhat didn’t find reason for hang in logs as we didn’t enabled netdump.

Please provide solution


Thanks
Srinu
Singapore
 
Old 09-11-2008, 09:56 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,634

Rep: Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965Reputation: 7965
Quote:
Originally Posted by srinivasvenu View Post
EXT3 file system remounting read only (Journal Aborting)


Recently we upgraded REDHAT Provided customized kernel but after that 2 months ok but yester day our system hang due to Kernal Panic. Redhat didn’t find reason for hang in logs as we didn’t enabled netdump.

Please provide solution
I'd suggest enabling netdump, and go to the kernel RedHat sent you. Then get RedHat to analyze that dump file next time it happens.

FWIW, I've seen IBM hardware be a bit twitchy with RedHat, and the problems aren't with Linux, but with the microcode/firmware. Depending on who you talk to at IBM, you might get different answers, or different suggestions for firmware levels.

From my past experience, the only way IBM will actually get a 'support professional' to look at it (i.e. escalate past level 1 tech support), is to definitively prove that it's a hardware issue. The only way you can do that is with the kernel panic dumps that RedHat has gone through. And if it IS a RedHat issue, they'll catch it and provide a fix. Either way, the dump is your best bet.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RHEL 5, Libdl.so.2 and Kernel panic at boot mek1 Linux - Server 9 10-23-2010 11:16 PM
kernel panic while booting custom compiled 2.6.24 kernel on RHEL 4 AS samkraju Red Hat 4 02-10-2008 12:55 AM
kernel panic: journal commit I/O error scott.anderson Red Hat 6 01-09-2008 09:48 AM
Kernel panic-on RHEL-4 amitava Linux - Enterprise 4 05-16-2007 10:40 PM
Kernel panic on RHEL 3 after 2 days of operation jalsk Red Hat 13 12-30-2004 05:38 PM

LinuxQuestions.org > Forums > Enterprise Linux Forums > Linux - Enterprise

All times are GMT -5. The time now is 02:53 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration