Critical SSD/RAID0 (?) bug under kernel 4.0.0, 4.0.1, 4.0.2 and 3.18.14 (?)
SlackwareThis Forum is for the discussion of Slackware Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
EDIT: After a follow-up on new posts I've had to change the title, please read the whole thread. Previously the title stated it was only an ext4 bug and only under 4.0.0, 4.0.1, 4.0.2.
Last edited by moisespedro; 05-23-2015 at 09:15 AM.
Anyone got any hard details on this? Most the articles seem to be linking back to the reports on Arch and Phoronix forums.
The ext4 fix in 4.0.3 was also included in the 3.10.78 patch and the issue seems to date back quite a ways , so I don't know why this only seems to be being reported for 4.0.y? Judging by this post by Ted Ts'o, he seems to be suggesting that this ext4 fix isn't the likely culprit. What I did notice from a quick scan of the forum posts where people seem to be reporting problems is that most appear to be using SSDs, or raid, (or both). Someone on the Arch forum has suggested that this TRIM related fix might have something to do with the issue, but who knows. It's all a little vague right now.
I've used all the kernels from 4.0.0 -> 4.0.4 and not had any issues here, but then I'm not using an SSD and I'm not doing anything fancy that would likely trigger the issue Ted was talking about. Hopefully things will become clearer.
The original commit for "md/raid0: fix bug with chunksize not a power of 2." which looks to be at fault also made it into at least 3.14.41 and 3.19.7, in addition to 4.0.2. This issue has not been addressed in any stable branches as of yet.
I couldn't see any signs of this making it into 3.10.y so, anyone still following that branch ought to be safe (at least, as far as you can be given that 'stable' doesn't seem to mean a great deal when talking about linux kernel branches). Grabbing 3.10.79 for the ext4 corruption bug fix might be worth considering if you're still on 3.10.y.
Nope just a regular ext4 FS on an SSD. It caused data corruption twice which made me suspect the SSD (since it's brand new), but the SMART checks reports all suggest that it was healthy so I suspected it must be a kernel issue (it's not the first time I've seen an FS corruption line up with a recent kernel upgrade and lots of FS activity).
I have other build machines with Linux 4.0.2 but which were mostly idle - there were no issues on those.
There's a fix notated in the ChangeLog for kernel 3.18.14. It's best to search for "ext4" and read what's there. Any summary by me would get something wrong.
There's a fix notated in the ChangeLog for kernel 3.18.14. It's best to search for "ext4" and read what's there. Any summary by me would get something wrong.
If you look at the explanation from the main kernel maintainer for ext4 it doesn't appear to be a bug that is easy to hit. See http://www.gossamer-threads.com/list...176274#2176274, so the changelog sounds a lot worse than it is in reality.
The corruption occurs if ext4 is used with RAID0. It hs been fixed by upstream developer and fix would be available with 4.1 or 4.0.4 and other LTS kernels.
Except, Stuart wasn't using raid0 and he was hit by corruption, as posted above. This is why I'm inclined to believe that there's multiple issues at play. Most people affected do seem to be using SSDs though.
BTW, 4.0.4 has been out a few days. I think you probably mean 4.0.5
The corruption occurs if ext4 is used with RAID0. It hs been fixed by upstream developer and fix would be available with 4.1 or 4.0.4 and other LTS kernels.
To be precise: this appears to be a bug in the raid0 that surfaces in a trim operation for any filesystem that supports trim according to the kernel-bug report: https://bugzilla.kernel.org/show_bug.cgi?id=98501 Which is why it primarily surfaces on ext4 combined with an SSD drive.
Anyway it looks like multiple ext4 bugs are getting fixed. This one doesn't appear to have been fixed in the LTS branch yet if I search for the commit-id in the bugreport.
@moisespedro: assuming this is the same bug, I suggest that you modify this thread's title that could be:
Quote:
WARNING: Software Raid 0 on SSD's and discard corrupts data
as it probably doesn't affect only ext4 file systems.
I am referring to this message from Holger Kiehl on LKML.
Also, at time of writing according to this comment from Eric Work the fix for this bug has been merged into Linus' tree as commit a81157768a00e8cf8a7b43b5ea5cac931262374f but that doesn't mean that the 4.0 branch has been fixed.
But if it's not actually the same problem, then being aware of both is a good thing...
Last edited by Didier Spaier; 05-23-2015 at 03:31 AM.
Reason: s/to be/that could be/
I don't know if any slackers are following the 3.18 branch, but if you are, be very, very, wary.
For clarity: the commit that is supposed to fix this issue is a81157768a00e8cf8a7b43b5ea5cac931262374f As far as I can tell that commit has not yet appeared in the 3.18 support branch. So Softpedia appears to be wrong here.
btw) Anyone on current would have this problem as the current branch runs with the 3.18 kernel.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.