LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   EXT3-fs error (device md0) in start_transaction: Journal has aborted (https://www.linuxquestions.org/questions/linux-software-2/ext3-fs-error-device-md0-in-start_transaction-journal-has-aborted-499344/)

pilot11 11-07-2006 02:58 AM

EXT3-fs error (device md0) in start_transaction: Journal has aborted
 
EXT3-fs error (device md0) in start_transaction: Journal has aborted

I get this spitting out over and over again, only redress is a reboot.

There seem to be no other errors and the server works normally afterwards. This error mostly happens overnight, there is no data lost.

I have no idea what is causing this and have searched Google and found several other instances of this error without a successful conclusion. there is some articles here:
http://lkml.org/lkml/2003/8/6/24
http://comments.gmane.org/gmane.linux.kernel/78181
http://forums.techguy.org/unix-linux...device-dm.html


I have so far reloaded all the software (Distribution: CentOs version 4 X86_64 Dell PowerEdge 2900 2 gig Ram. Raid array 5) and this is still happening every night. Is this a Kernal Bug ?

Can anyone give me a lead on what to do next ? This is the first time using Linux on our servers, and I really need to get over this problem so any help will be greatly appreciated.

netsecsvc 12-05-2006 11:54 AM

I'm having the exact problem on Debian! What hardware are you using? I'm using an IBM x360 server with a ServeRAID4 controller connected to a Compaq external array.

My only theory at this point is the linux drivers for my Serveraid 4 do not match the bios. I posted a thread about this before with no solution I dont think. If you are also using a Serveraid controller then this would validate my theory a bit.

I've got two new servers that have been sitting here for three months waiting for this to be resolved before I can go production with my Linuz/vmware solution...

pilot11 12-05-2006 12:14 PM

The Drives are 3 Maxtor 139392 MB Physical Disks on a brand new Power edge 2900 We have a Perc 5 raid controller and using Centos X86_64 version 4.4

Thought we had got over the problem, but it came back last week, and this error appears every night. Shutting down every night saves the server crashing until a solution can be found.

We are going to run the updates at dell for the drivers for the raid controller, but still a bit stumped!

netsecsvc 12-06-2006 09:10 AM

Actually, I'm glad to hear that it is probably not my hardware. Are you using Vmware or anything that has very large files stored in the file system? I have four 18.2gb virtual disk files. It could be that ext3fs doesn't handle large files well.

pilot11 12-06-2006 09:34 AM

I don't have Vmware, and no huge files either, it's a new machine and hasn't been loaded yet.
This machine has produced this error only twice during the day, I would say 99% of the errors happen in the early hours of the Morning 3 or 4 AM. I have a suspision that there is a network interference (smbd) that upsets the drivers for Perc 5 and an update from Dell looks on the cards?

netsecsvc 12-06-2006 11:28 AM

It seems the only logical conclusion is that the problem is in the ext3 filesystem jounaling itself (a bug). I'm going to reload and use ext2 probably.

pilot11 12-07-2006 02:25 AM

Appreciate it if you would let me know if you make progress.
Thanks

netsecsvc 12-26-2006 01:23 PM

ext2
 
Okay, I've just reloaded with ext2 instead of ext3. On the vmware volume I set the inodes to large file size :).

I'm running debian linux kernel 2.6.8 with smp p4xeon and hyperthreading (recompiled kernel). We'll see if it the error goes away now. Since it was an ext3 journaling error, it wouldn't suprise me if the error does go away, or change for that matter, but will it crap out after 72 hours or so? I'll try to post back and let you know, my fingers are crossed!

pilot11 01-02-2007 02:44 AM

Iv'e gone through and updated all the drivers from dell:
dkms-2.0.13-1.noarch.rpm
Bcom_LAN_NX2_26_Linux_DKMS_A00.tar
from:
http://support.dell.com/support/down...s=LIN4&osl=EN#
and also made sure that there are no users logged in when not needed, and so far since the 18th Dec the problem has not reared its head.

The next test is to log in as a user and leave it running to see if that could be a cause of the problem

still using EXT3

netsecsvc 01-02-2007 08:20 AM

It worked!
 
Ext2 seems to have fixed the problem. Not a single lockup.

FYI, if it has anything to do with a logged in user then having vnc4server loaded may have worsened my situation. It did seem to always occur overnight when some journal system cleanup jobs were running or something. Anyways with ext2 I can have vnc running and everything is working as it should.

:D

migz 01-08-2007 02:57 PM

Journal aboort issue?
 
I currently have a dell poweredge 2650 with a perc4 (LSI) RAID controller with Redhat 4 es. I too have experienced this problem. I can not use the ext2 filesystem because it has a 2GB file limit.
I do have ext3 on other file servers with no problems, so I am wondering is there a problem with ext3 filesystem? Can tunefs help?

netsecsvc 01-09-2007 09:01 AM

A 2gb file limit? That's not true. I have 7+gb virctual machine disk files on mine. You can even set the inode sizes for large files. I haven't looked it up but I'm guessing you could have files up to a terrabyte on exfs2. Definitely larger than 2gb...

netsecsvc 01-09-2007 09:04 AM

I think you are thinking of an older implementation of exfs2.

migz 01-09-2007 10:39 AM

Nice, I think I'm going to got to ext2 then. I really don't want to see anymore Journal abort errors, thanks.

ChrisDN 02-27-2007 04:14 AM

Sorry if I'm reviving an old thread but I have been getting this error alot lately too. Could someone in the know tell me how exactly I would reload with ext2 instead of ext3?

I'm running 2.4.22 Gentoo r7.

Any help would be much appreciated. Thanks in advance.


All times are GMT -5. The time now is 02:50 PM.