Critical SSD/RAID0 (?) bug under kernel 4.0.0, 4.0.1, 4.0.2 and 3.18.14 (?)
It seems those versions have a major bug that causes ext4 data corruption, resulting in data loss.
4.0.3 fixes this issue. EDIT: After a follow-up on new posts I've had to change the title, please read the whole thread. Previously the title stated it was only an ext4 bug and only under 4.0.0, 4.0.1, 4.0.2. |
Quote:
|
Anyone got any hard details on this? Most the articles seem to be linking back to the reports on Arch and Phoronix forums.
The ext4 fix in 4.0.3 was also included in the 3.10.78 patch and the issue seems to date back quite a ways , so I don't know why this only seems to be being reported for 4.0.y? Judging by this post by Ted Ts'o, he seems to be suggesting that this ext4 fix isn't the likely culprit. What I did notice from a quick scan of the forum posts where people seem to be reporting problems is that most appear to be using SSDs, or raid, (or both). Someone on the Arch forum has suggested that this TRIM related fix might have something to do with the issue, but who knows. It's all a little vague right now. I've used all the kernels from 4.0.0 -> 4.0.4 and not had any issues here, but then I'm not using an SSD and I'm not doing anything fancy that would likely trigger the issue Ted was talking about. Hopefully things will become clearer. |
This one seems likely:
https://bugzilla.kernel.org/show_bug.cgi?id=98501 The original commit for "md/raid0: fix bug with chunksize not a power of 2." which looks to be at fault also made it into at least 3.14.41 and 3.19.7, in addition to 4.0.2. This issue has not been addressed in any stable branches as of yet. I couldn't see any signs of this making it into 3.10.y so, anyone still following that branch ought to be safe (at least, as far as you can be given that 'stable' doesn't seem to mean a great deal when talking about linux kernel branches). Grabbing 3.10.79 for the ext4 corruption bug fix might be worth considering if you're still on 3.10.y. Stu, were you using raid0 by any chance? |
Quote:
I have other build machines with Linux 4.0.2 but which were mostly idle - there were no issues on those. |
There's a fix notated in the ChangeLog for kernel 3.18.14. It's best to search for "ext4" and read what's there. Any summary by me would get something wrong.
|
Quote:
|
It seems the problem affects RAID0 more.
|
@moisespedro
Thanks for highlighting this issue. This is definitely worth investigating. Also thanks to everyone who has given input here. It's nice to have some starting points and intitial reading to get familiar with the problem(s). |
The corruption occurs if ext4 is used with RAID0. It hs been fixed by upstream developer and fix would be available with 4.1 or 4.0.4 and other LTS kernels.
|
Except, Stuart wasn't using raid0 and he was hit by corruption, as posted above. This is why I'm inclined to believe that there's multiple issues at play. Most people affected do seem to be using SSDs though.
BTW, 4.0.4 has been out a few days. I think you probably mean 4.0.5 |
Quote:
Anyway it looks like multiple ext4 bugs are getting fixed. This one doesn't appear to have been fixed in the LTS branch yet if I search for the commit-id in the bugreport. |
@moisespedro: assuming this is the same bug, I suggest that you modify this thread's title that could be:
Quote:
I am referring to this message from Holger Kiehl on LKML. Also, at time of writing according to this comment from Eric Work the fix for this bug has been merged into Linus' tree as commit a81157768a00e8cf8a7b43b5ea5cac931262374f but that doesn't mean that the 4.0 branch has been fixed. But if it's not actually the same problem, then being aware of both is a good thing... |
Softpedia have just posted an item saying that 3.18.14 LTS is out with fixes for these issues. Unless, I'm mistaken 3.18.14 is actually introducing the April 10th md/raid0 commit that appears to be behind much of the trouble.
I don't know if any slackers are following the 3.18 branch, but if you are, be very, very, wary. |
Quote:
btw) Anyone on current would have this problem as the current branch runs with the 3.18 kernel. |
All times are GMT -5. The time now is 03:24 PM. |