LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 05-20-2015, 10:16 AM   #1
moisespedro
Senior Member
 
Registered: Nov 2013
Location: Brazil
Distribution: Slackware
Posts: 1,223

Rep: Reputation: 195Reputation: 195
Critical SSD/RAID0 (?) bug under kernel 4.0.0, 4.0.1, 4.0.2 and 3.18.14 (?)


It seems those versions have a major bug that causes ext4 data corruption, resulting in data loss.

4.0.3 fixes this issue.

EDIT: After a follow-up on new posts I've had to change the title, please read the whole thread. Previously the title stated it was only an ext4 bug and only under 4.0.0, 4.0.1, 4.0.2.

Last edited by moisespedro; 05-23-2015 at 09:15 AM.
 
Old 05-20-2015, 11:33 AM   #2
drmozes
Slackware Contributor
 
Registered: Apr 2008
Distribution: Slackware
Posts: 1,531

Rep: Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274
Quote:
Originally Posted by moisespedro View Post
It seems those versions have a major bug that causes ext4 data corruption, resulting in data loss.

4.0.3 fixes this issue.
Yep I can testify - it destroyed one of my ARM build machines. I thought it was perhaps the SSD but suspected otherwise.
 
Old 05-20-2015, 04:12 PM   #3
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,882

Rep: Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988
Anyone got any hard details on this? Most the articles seem to be linking back to the reports on Arch and Phoronix forums.

The ext4 fix in 4.0.3 was also included in the 3.10.78 patch and the issue seems to date back quite a ways , so I don't know why this only seems to be being reported for 4.0.y? Judging by this post by Ted Ts'o, he seems to be suggesting that this ext4 fix isn't the likely culprit. What I did notice from a quick scan of the forum posts where people seem to be reporting problems is that most appear to be using SSDs, or raid, (or both). Someone on the Arch forum has suggested that this TRIM related fix might have something to do with the issue, but who knows. It's all a little vague right now.

I've used all the kernels from 4.0.0 -> 4.0.4 and not had any issues here, but then I'm not using an SSD and I'm not doing anything fancy that would likely trigger the issue Ted was talking about. Hopefully things will become clearer.
 
Old 05-21-2015, 06:59 AM   #4
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,882

Rep: Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988
This one seems likely:
https://bugzilla.kernel.org/show_bug.cgi?id=98501

The original commit for "md/raid0: fix bug with chunksize not a power of 2." which looks to be at fault also made it into at least 3.14.41 and 3.19.7, in addition to 4.0.2. This issue has not been addressed in any stable branches as of yet.

I couldn't see any signs of this making it into 3.10.y so, anyone still following that branch ought to be safe (at least, as far as you can be given that 'stable' doesn't seem to mean a great deal when talking about linux kernel branches). Grabbing 3.10.79 for the ext4 corruption bug fix might be worth considering if you're still on 3.10.y.


Stu, were you using raid0 by any chance?
 
Old 05-21-2015, 07:10 AM   #5
drmozes
Slackware Contributor
 
Registered: Apr 2008
Distribution: Slackware
Posts: 1,531

Rep: Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274Reputation: 1274
Quote:
Originally Posted by GazL View Post

Stu, were you using raid0 by any chance?
Nope just a regular ext4 FS on an SSD. It caused data corruption twice which made me suspect the SSD (since it's brand new), but the SMART checks reports all suggest that it was healthy so I suspected it must be a kernel issue (it's not the first time I've seen an FS corruption line up with a recent kernel upgrade and lots of FS activity).
I have other build machines with Linux 4.0.2 but which were mostly idle - there were no issues on those.
 
Old 05-21-2015, 09:13 AM   #6
mlslk31
Member
 
Registered: Mar 2013
Location: Florida, USA
Distribution: Slackware, FreeBSD
Posts: 210

Rep: Reputation: 76
There's a fix notated in the ChangeLog for kernel 3.18.14. It's best to search for "ext4" and read what's there. Any summary by me would get something wrong.
 
2 members found this post helpful.
Old 05-21-2015, 03:21 PM   #7
moesasji
Member
 
Registered: May 2008
Distribution: Slackware Current / OpenBSD
Posts: 322

Rep: Reputation: 104Reputation: 104
Quote:
Originally Posted by mlslk31 View Post
There's a fix notated in the ChangeLog for kernel 3.18.14. It's best to search for "ext4" and read what's there. Any summary by me would get something wrong.
If you look at the explanation from the main kernel maintainer for ext4 it doesn't appear to be a bug that is easy to hit. See http://www.gossamer-threads.com/list...176274#2176274, so the changelog sounds a lot worse than it is in reality.
 
1 members found this post helpful.
Old 05-21-2015, 03:53 PM   #8
moisespedro
Senior Member
 
Registered: Nov 2013
Location: Brazil
Distribution: Slackware
Posts: 1,223

Original Poster
Rep: Reputation: 195Reputation: 195
It seems the problem affects RAID0 more.
 
Old 05-21-2015, 06:21 PM   #9
j_v
Member
 
Registered: Oct 2011
Distribution: Slackware64
Posts: 364

Rep: Reputation: 67
@moisespedro
Thanks for highlighting this issue. This is definitely worth investigating.

Also thanks to everyone who has given input here. It's nice to have some starting points and intitial reading to get familiar with the problem(s).
 
Old 05-22-2015, 04:49 AM   #10
veerain
Senior Member
 
Registered: Mar 2005
Location: Earth bound to Helios
Distribution: Custom
Posts: 2,524

Rep: Reputation: 319Reputation: 319Reputation: 319Reputation: 319
The corruption occurs if ext4 is used with RAID0. It hs been fixed by upstream developer and fix would be available with 4.1 or 4.0.4 and other LTS kernels.
 
Old 05-22-2015, 04:51 AM   #11
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,882

Rep: Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988
Except, Stuart wasn't using raid0 and he was hit by corruption, as posted above. This is why I'm inclined to believe that there's multiple issues at play. Most people affected do seem to be using SSDs though.

BTW, 4.0.4 has been out a few days. I think you probably mean 4.0.5

Last edited by GazL; 05-22-2015 at 04:53 AM.
 
Old 05-23-2015, 02:38 AM   #12
moesasji
Member
 
Registered: May 2008
Distribution: Slackware Current / OpenBSD
Posts: 322

Rep: Reputation: 104Reputation: 104
Quote:
Originally Posted by veerain View Post
The corruption occurs if ext4 is used with RAID0. It hs been fixed by upstream developer and fix would be available with 4.1 or 4.0.4 and other LTS kernels.
To be precise: this appears to be a bug in the raid0 that surfaces in a trim operation for any filesystem that supports trim according to the kernel-bug report: https://bugzilla.kernel.org/show_bug.cgi?id=98501 Which is why it primarily surfaces on ext4 combined with an SSD drive.

Anyway it looks like multiple ext4 bugs are getting fixed. This one doesn't appear to have been fixed in the LTS branch yet if I search for the commit-id in the bugreport.
 
Old 05-23-2015, 03:02 AM   #13
Didier Spaier
LQ Addict
 
Registered: Nov 2008
Location: Paris, France
Distribution: Slint64-15.0
Posts: 11,048

Rep: Reputation: Disabled
@moisespedro: assuming this is the same bug, I suggest that you modify this thread's title that could be:
Quote:
WARNING: Software Raid 0 on SSD's and discard corrupts data
as it probably doesn't affect only ext4 file systems.
I am referring to this message from Holger Kiehl on LKML.

Also, at time of writing according to this comment from Eric Work the fix for this bug has been merged into Linus' tree as commit a81157768a00e8cf8a7b43b5ea5cac931262374f but that doesn't mean that the 4.0 branch has been fixed.

But if it's not actually the same problem, then being aware of both is a good thing...

Last edited by Didier Spaier; 05-23-2015 at 03:31 AM. Reason: s/to be/that could be/
 
1 members found this post helpful.
Old 05-23-2015, 05:14 AM   #14
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,882

Rep: Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988Reputation: 4988
Softpedia have just posted an item saying that 3.18.14 LTS is out with fixes for these issues. Unless, I'm mistaken 3.18.14 is actually introducing the April 10th md/raid0 commit that appears to be behind much of the trouble.

I don't know if any slackers are following the 3.18 branch, but if you are, be very, very, wary.
 
Old 05-23-2015, 05:34 AM   #15
moesasji
Member
 
Registered: May 2008
Distribution: Slackware Current / OpenBSD
Posts: 322

Rep: Reputation: 104Reputation: 104
Quote:
Originally Posted by GazL View Post
Softpedia have just posted an item saying that 3.18.14 LTS is out with fixes for these issues. Unless, I'm mistaken 3.18.14 is actually introducing the April 10th md/raid0 commit that appears to be behind much of the trouble.

I don't know if any slackers are following the 3.18 branch, but if you are, be very, very, wary.
For clarity: the commit that is supposed to fix this issue is a81157768a00e8cf8a7b43b5ea5cac931262374f As far as I can tell that commit has not yet appeared in the 3.18 support branch. So Softpedia appears to be wrong here.

btw) Anyone on current would have this problem as the current branch runs with the 3.18 kernel.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regarding distribution + kernel version + gcc version + glib version. JCipriani Linux - General 8 04-19-2008 02:54 PM
How to install kernel version 2.4 on existing Kernel version 2.6 sudeepak Linux - Kernel 3 11-29-2007 08:10 AM
redhat fedora gcc version (compiled for 2.4.20) doesn't match kernel version 2.4.22 start1000 Linux - Software 0 03-16-2004 08:17 PM
Xfree prob radeon.o kernel module version is 1.1.1 but version 1.5.0 or newer needed. jimdaworm Slackware 0 10-01-2003 06:27 PM
Sample Driver Module incompatibility with Kernel version (or gcc version) jvs Linux - General 1 06-07-2002 01:40 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 05:58 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration