[SOLVED] Partition Errors and Remounts Read-Only when Accessing Specific File
Linux - HardwareThis forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Whenever I try to access one of 3-4 files in a specific directory in the `/home` partition, (the specific folder causing the issues is `/home/path/to/broken/folder`), the `/home` partition will error and remount read-only. `dmesg` shows the following errors:
Code:
EXT4-fs error (device sda2): ext4_ext_check_inode:497: inode <b>#1415</b>: comm rm: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
Aborting journal on device sda2-8.
EXT4-fs (sda2): Remounting filesystem read-only
EXT4-fs error (device sda2): ext4_ext_check_inode:497: inode <b>#1417</b>: comm rm: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
EXT4-fs error (device sda2): ext4_ext_check_inode:497: inode <b>#1416</b>: comm rm: pblk 0 bad header/extent: invalid magic - magic 0, entries 0, max 0(0), depth 0(0)
So I understand what is going on...some bad block is causing an error and is remounting the drive read-only to prevent further corruption. I know it is these specific files because I can undo the error by
1. Logging in as root
2. Running `sync`
3. Stopping `lightdm` (and all sub-processes)
4. Stop all remaining open files on `/home` by finding them with `lsof | grep /home`
5. Unmounting `/home`
6. Running `fsck /home` (fixing the errors)
7. Remount `/home`
Everything is fine again, read and write, *until I try to access the same files again*, then this entire process is repeated to fix it again.
The way I've tried to access the files is by running `ls /home/path/to/broken/folder` and `rm -r /home/path/to/broken/folder`, so it seems any kind of HDD operation on that part of the drive errors it and throws it into read-only again.
I honestly don't care about the files, I just want them gone. I am willing to remove the entire `/home/path/to/broken/folder` folder, but every time I try this, it fails and throws into read-only.
I ran `badblocks -v /dev/sda2` on my hard drive, but it came out clean, no bad blocks. Any help would still be greatly appreciated.
I have bolded what I believe are the key differences here. I looked at other non-corrupted inodes and they display something similar to the 1410 that has a non-zero size and an extent.
Bad header/extent makes sense here...it has no extent...
How can I fix this without re-formatting the entire `/home` drive?
I really feel like I've handed this question to someone smarter than me on a silver platter, I just don't know what the meal (answer) is!
That is all completely normal for a zero-length file. Nothing in the dmesg output you posted indicates a hardware error. That should have shown up as an "ata[n]" error prior to the first EXT4-fs error.
Quote:
How can I fix this without re-formatting the entire `/home` drive?
One way would be to use "debugfs -w /dev/sda2" and use its clri command to zero out the affected inodes. You would then need to run "fsck -f /dev/sda2" to clean up the resulting filesystem inconsistencies.
First, (this may be false for older versions of fsck), but `fsck /home` is the same as `fsck /dev/sda2` as long as it is in "/etc/fstab".
The only error fsck came up with was "Data contains a file system with errors, check forced."
Then it goes through the five passes and finishes. Any run after that without accessing the problem files will say the drive is clean.
...and it worked! The bad inodes are gone and my system is fixed! Thanks so much!
For anybody else having this issue, I found my bad inodes (1415-1417) by running `find` on the bad mounted partition and then reading `dmesg` for the errors on the bad inodes.
First, (this may be false for older versions of fsck), but `fsck /home` is the same as `fsck /dev/sda2` as long as it is in "/etc/fstab".
Apparently it's been that way for quite a while. I just never knew about it.
Quote:
The only error fsck came up with was "Data contains a file system with errors, check forced."
Then it goes through the five passes and finishes. Any run after that without accessing the problem files will say the drive is clean.
Unless the flags in the super block indicate the filesystem was not cleanly unmounted, you need to use the "-f" option to make fsck actually do anything.
...and it worked! The bad inodes are gone and my system is fixed!
A bit of a shame to lose that example of an error condition that fsck.ext4 fails to detect, really. The authors might have been interested to know just what was wrong. Collecting the data needed for the bug report would have been a bit of a problem, though.
I believe I might have tried the "-f" option before, it may have fixed some errors, but the issue still didn't resolve. I would have been grateful to send someone a disk image of the partition, but the machine it is on was high-use, and I needed a fix as soon as possible. Thanks again for the help!
rknichols, I've experienced the same issue again with another file I didn't find before. Bad inodes again, here is the output of `fsck -fy /home`:
Code:
e2fsck 1.42.13 (17-May-2015)
fsck from util-linux 2.27.1
[/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -fy /dev/sda2
Data: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 19136782 has zero dtime. Fix? yes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 19137402 was part of the orphaned inode list. FIXED.
Inode 19137647 was part of the orphaned inode list. FIXED.
Inode 19137648 was part of the orphaned inode list. FIXED.
Inode 19137907 was part of the orphaned inode list. FIXED.
Inode 19138044 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(76579640--76579644) -(76579787--76579796) -(78687758--78687762) -(78933505--78933509)
Fix? yes
Free blocks count wrong for group #2337 (4783, counted=4798).
Fix? yes
Free blocks count wrong for group #2401 (870, counted=875).
Fix? yes
Free blocks count wrong for group #2408 (1420, counted=1425).
Fix? yes
Inode bitmap differences: -19136782 -19137402 -(19137647--19137648) -19137907 -19138044
Fix? yes
Free inodes count wrong for group #2336 (6631, counted=6637).
Fix? yes
Data: ***** FILE SYSTEM WAS MODIFIED *****
Data: 472231/47423488 files (0.5% non-contiguous), 122481385/189664000 blocks
Here's another interesting problem. I'm running into these files by running baobab (disk utilization tool) in the /home directory. I tried doing it again to see if the fsck output is different, and it is! Here is another output from fsck after trying baobab again, running into the same troublesome file:
Code:
fsck from util-linux 2.27.1
[/sbin/fsck.ext4 (1) -- /home] fsck.ext4 -fy /dev/sda2
Data: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 19136782 was part of the orphaned inode list. FIXED.
Inode 19137402 was part of the orphaned inode list. FIXED.
Inode 19137647 was part of the orphaned inode list. FIXED.
Inode 19137648 was part of the orphaned inode list. FIXED.
Deleted inode 19137895 has zero dtime. Fix? yes
Inode 19137907 was part of the orphaned inode list. FIXED.
Inode 19138044 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(76579640--76579644) -(76579781--76579785) -(76579787--76579791) -(78758406--78758410) -(79299589--79299593) -(79506945--79506949)
Fix? yes
Free blocks count wrong for group #2337 (4784, counted=4799).
Fix? yes
Free blocks count wrong for group #2403 (2028, counted=2033).
Fix? yes
Free blocks count wrong for group #2420 (1668, counted=1673).
Fix? yes
Free blocks count wrong for group #2426 (1406, counted=1411).
Fix? yes
Inode bitmap differences: -19136782 -19137402 -(19137647--19137648) -19137895 -19137907 -19138044
Fix? yes
Free inodes count wrong for group #2336 (6630, counted=6637).
Fix? yes
Data: ***** FILE SYSTEM WAS MODIFIED *****
Data: 472231/47423488 files (0.5% non-contiguous), 122481385/189664000 blocks
What is going on here, and is there any way I can reliably get you some info / maybe a copy of my disc for you to look at?
Now, fsck is finding and correcting errors. That is quite different from the previous cases, where fsck just cleared the error flag in the super block and didn't find anything else wrong.
Did the filesystem spontaneously go read-only again? That "recovering journal" message indicates that the filesystem was still dirty. Assuming that the filesystem had always previously been unmounted properly, the most likely cause for repeated corruption like this would be hardware issues. Have you run an overnight memory test on this system recently? There don't seem to be any reported errors from the disk drive, but a "smartctl -t long" might be appropriate.
Beyond that, this is way above my pay scale. I'm really not familiar with the internals of ext4. Even if a compressed QCOW2 e2image file of the metadata (see the manpage for e2image) were small enough to send to me, I doubt I could tell anything from it beyond what fsck already reported.
Yes, the filesystem went read-only. I disabled "errors=remount-ro" for the time being, so I can continue working on the drive even when it complains about the bad inodes. I am running "smartctl -t long /dev/sda2" now.
Sounds good, though, I'll see what I can do. I may end up having to reformat the drive or even get a new one. I'll post the results of smartctl here when it is finished.
It's a Seagate drive, and the raw values in those parameters are not simple error counts. http://www.users.on.net/~fzabkar/HDD..._RRER_HEC.html for more info. In any event, those are internal events in the drive, unrelated to the quality of the SATA cable.
Yea, I changed the SATA cable, no difference. By memory test, do you mean the program you can boot into from grub before selecting Ubuntu? Or is there another program / method of memory testing I should try?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.