LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   FS gets marked as read only when I copy lots of files? (https://www.linuxquestions.org/questions/linux-newbie-8/fs-gets-marked-as-read-only-when-i-copy-lots-of-files-4175483229/)

gregmcc 11-03-2013 03:49 AM

FS gets marked as read only when I copy lots of files?
 
I'm running OpenSuse 12.3 (also happened on 12.1 and 11.x) and have a weird problem in that if I copy lots of files to the hard drive it eventually gets marked as read only. I have this problem across all my HDD's (2 x Seagate 2TB, and 2 x WD 1TB)

I have a rsync script that backs up my Raspberry Pi's so it copies all the files locally to the OpenSuse server.

The weird thing is the server runs for weeks fine. I host my music and TV series on the server and can watch series and play music with no problems (Via samba shares)

The issue only presents itself when I copy lots of files to the server. If I copy say 10 its fine. Copying 100+ and it falls over.

I get this error:

Code:

[2320468.963758] jbd2_journal_bmap: journal block not found at offset 32763 on sdd1-8
[2320468.963763] Aborting journal on device sdd1-8.
[2320468.965040] EXT4-fs error (device sdd1): ext4_journal_start_sb:350: Detected aborted journal
[2320468.965047] EXT4-fs (sdd1): Remounting filesystem read-only


Rebooting the server and all is fine again.

Any ideas? I've had this problem for ages now and I've had enough!

Its happens if I copy lots of files to any disk - they can't all be failing can they?

unSpawn 11-03-2013 05:18 AM

Quote:

Originally Posted by gregmcc (Post 5057516)
Code:

[2320468.963758] jbd2_journal_bmap: journal block not found at offset 32763 on sdd1-8
[2320468.963763] Aborting journal on device sdd1-8.
[2320468.965040] EXT4-fs error (device sdd1): ext4_journal_start_sb:350: Detected aborted journal
[2320468.965047] EXT4-fs (sdd1): Remounting filesystem read-only


More nfo is always better IMHO and there's likely more related lines leading up to these error messages. Provided the machine or disks didn't show any (other) errors before:
- and provided these aren't external drives witht their own power supply do check your PSU as it may not be powerful enough,
- check if the partition doesn't have a "barrier=0" (0 == disable) mount option,
- If you're only dumping files try running a partition w/o journaling,
- way lame but check if using rsync with --bwlimit= at least helps ease things?

gregmcc 11-03-2013 10:40 AM

Some more info:

These are all internal sata disks. I've checked the messages log and there are no errors of any kind before. I have also tried reformatting one of the disks to ext3. Same things happens are copying a few hundred files.

Here is a copy of the fstab:
Quote:

/dev/disk/by-id/ata-SAMSUNG_HD154UI_S1XWJ90ZC08914-part1 /home/disk4 ext4 defaults 1 2
/dev/disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BB210467-part1 /home/disk1 ext4 defaults 1 2
/dev/disk/by-id/ata-SAMSUNG_HD204UI_S2H7J9BB210746-part1 /home/disk2 ext4 defaults 1 2
/dev/disk/by-id/ata-SAMSUNG_HD103SI_S1VSJ90Z306400-part1 /home/disk5 ext4 defaults 1 2
Should it have the barrier=0 set?

What are the pro/cons of turning off journalling. I use the machine as a file server so its not just for dumping backup files.

I'll try the rsync option and see if things improve, but I dont think so as I'm copying the files from my R-Pi over my 54Mbps wifi so its hardly stressing the disks.

I've also checked all the disks with smartctl to see if they reported any errors:

Quote:

SMART Error Log Version: 1
No Errors Logged

unSpawn 11-03-2013 10:57 AM

Quote:

Originally Posted by gregmcc (Post 5057670)
I have also tried reformatting one of the disks to ext3. Same things happens are copying a few hundred files.

What happens if you locally copy the same amount of files from one disk to another?
*BTW I mean Ext2 or a partition mounted w/o journaling.


Quote:

Originally Posted by gregmcc (Post 5057670)
Should it have the barrier=0 set?

No.


Quote:

Originally Posted by gregmcc (Post 5057670)
What are the pro/cons of turning off journalling.

Pro: less write ops, not having to check and write the journal basically.
Con: degraded integrity (but only) on system crash or power failure.


Quote:

Originally Posted by gregmcc (Post 5057670)
I use the machine as a file server so its not just for dumping backup files.

I'm talking about a partition not the complete machine.


Quote:

Originally Posted by gregmcc (Post 5057670)
I'll try the rsync option and see if things improve, but I dont think so

Theory is that by making rsync slow down files are accessed / written to at a lower rate. This may give whatever is causing this situation more time to recuperate.

btmiller 11-03-2013 12:09 PM

In cases like this, it's usually a good idea to check that the hardware itself is OK. Your disk could have an issue that only shows up under heavy load. I'd suggest running a quick test using smartctl if your disk has SMART functionality. To do so, you can run:

Code:

smartctl -t short /dev/sdd # run test
smartctl -l selftest /dev/sdd # view results

Note that the test is non-destructive, but it might slow down performance, so best to do it under a light load. There are other longer tests that can be run too (some destructive, some not so), see the man page for smartctl for more details.

gregmcc 11-03-2013 01:08 PM

Thanks - I'll run the test and report back.

Interestingly I copied about 3000 files from one drive to the other a few times and experienced no problems.

Next I put the limit on rsync and everything resyn'd successfully albeit slowly.

Getting more and more confusing. Hopefully the smart test will show some results.

Update: Mmmm....all drives test ok.

Quote:

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 22089 -

syg00 11-03-2013 03:26 PM

No need to change filesystem - on ext4 you can disable the journal with tune2fs.
Another (better ?) option might be to try the journal on a different device (a quiet one preferably).

suicidaleggroll 11-03-2013 03:44 PM

So this happens on multiple drives, multiple versions of OpenSUSE.

I recommend booting a live disk of some other distro and see if you run into the problem there as well. If so, it could be the controller on your motherboard.

Doing big copies between local disks to see if the problem still shows up is also a good idea.

gregmcc 11-08-2013 11:31 AM

The problem came back even with the limit on rsync.

I've now removed the journaling and its been running without a hitch for few days now. I think the problem has been solved.


Thanks all.

unSpawn 11-08-2013 12:24 PM

Thanks for the feedback, good to know.

syg00 11-08-2013 03:28 PM

IMHO turning off journalling is not a long-term "solution". Exposes you to corruption you may never hear about.
There is (was anyway) an ext[34] maillist - Ted may like to offer an opinion on what happened, and what you can do to avoid it. He will want that oops backtrace I suspect.


All times are GMT -5. The time now is 03:38 AM.