LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices

Reply
 
Search this Thread
Old 11-13-2012, 06:46 PM   #1
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Rep: Reputation: 31
USB based root filesystem getting corrupted


Hello everyone,

I have a problem concerning a linux installation on an usb stick. I installed debian on a 8GB Intenso USB stick to free all the IDE connectors for data drives (raid5). The stick is bootable and has three partitions (boot, root, swap). Both boot and root are ext3 and are equally affected by the problem.

The system runs without problems after startup. It takes about 3 hours before the following appears in the logs:

Code:
[65218.818785] attempt to access beyond end of device
[65218.818810] sde2: rw=1217, want=21569394000, limit=11718656
[65218.818830] Buffer I/O error on device sde2, logical block 2696174249
[65218.818849] lost page write due to I/O error on sde2
[65218.818868] Aborting journal on device sde2.
[65218.820095] ext3_abort called.
[65218.820110] EXT3-fs error (device sde2): ext3_journal_start_sb: Detected aborted journal
[65218.820142] Remounting filesystem read-only
However problems tend to arise before that. The filesystem gets massively corrupted, mostly beyond fsck repairs. I am well aware of the fact that usb flash devices tend to come with some problems regarding persistency, but this behavior is well beyond what is to be expected. The system can not be rebooted, fsck can be forced to fix stuff but ends up corrupting the filesystem beyond recognition (/lib/ld-linux.so getting mangled, init being unreadable, binaries and directories getting lost without ending up in lost+found). I checked the stick for bad sectors, but have not found anything that would justify this behavior.

The filesystem was originally ext4. After the given problem arose I downgraded it to ext3 (mkfs -t ext and then copied a sound backup of the root filesystem back to the stick), but that did not solve the problem. In any case I doubt that the error is related to the filesystem itself and that switch to JFS or something similar would solve the problem if a well tested FS like ext3 failes so massively.

I have searched for a reason why a page write would extend beyond the end of the device and found the following problem in the partition tables.

Code:
Partition 2 has different physical/logical beginnings (non-Linux?):
     phys=(15, 140, 62) logical=(16, 77, 59)
Partition 2 has different physical/logical endings:
     phys=(745, 1, 24) logical=(781, 133, 32)
Partition 2 does not end on cylinder boundary.
Partition 2: previous sectors 11968511 disagrees with total 11414612
All partitions on the usb stick have this problem, even after a fresh fdisk. I assume that is because the device is not a physical disk to which the concept of a physical sector applies (?) but this would explain why the filesystem screws up pagewrites by writing to beyond the flash devices extends.

Please do not suggest to switch the filesystem or use a "more capable" partioner. I have full trust in both fdisk and ext3, having used these tools for more than a decade (ext2 before ext3). Other suggestions on how this problem can be overcome are more than welcome.

Kind regards,

ichrispa
 
Old 11-14-2012, 07:09 AM   #2
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Ubuntu
Posts: 1,210

Rep: Reputation: 238Reputation: 238Reputation: 238
Do you see the same corruption if you run without an active swap partition?
 
Old 11-14-2012, 02:33 PM   #3
jefro
Guru
 
Registered: Mar 2008
Posts: 11,105

Rep: Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362
Get a new flash drive and try it again, be sure to test this on a usb 2.0 and not a 3.0

Since you suggested it, ext3 along with swap is a poor choice. There is too much going on through that chipset. Use ext2 and no swap if you can for a simple test.

I suspect that it is a combination of the drive, chipset, and distro to some degree. Try the new flash drive first. I have had some fail like this. Finally threw them away.
 
Old 11-15-2012, 09:47 AM   #4
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
Hello jefro and RockDoctor, thank you both for your replies.

I grabed a new usb flash drive (this time from Transcend). I created a new partition table which verified ok. Then I disabled the swap, both by declaring the partition as linux (83), formatting it as ext3 and removing the entry from fstab. Just to be sure, I added noswap to the extlinux boot parameters.

I also rechecked the new drive for bad sectors/blocks. fsck -c did not find any.

I run into the very same error condition. Even more interestingly this did not even take 3 hours this time, it happened at boot. Just after "Waiting for root filesystem" came an entire batch of "attempt to access beyond end of device". Of course the system did not boot after that.

The Chipset of the computer is VIA, which I have very bad experiences with in general. There is no support for USB3.0, so I guess that aspect is taken care off. However I will try to run in USB1.1 mode an see if this fixes the problem. I do suspect that the USB Controller has a problem with USB HighSpeed devices in some sense. It is quite strange that the USB Mode is configurable in the BIOS... I will come back to you after I have played around with the USB settings a bit.
 
Old 11-15-2012, 03:17 PM   #5
jefro
Guru
 
Registered: Mar 2008
Posts: 11,105

Rep: Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362
Unless this is a very odd disto, usb 1.x will not be much of an improvement for a test.

Do a quick test with a different distro? OpenSuse 12.3 seems to have fixed some usb booting support for one of my systems. Try some other non-debian based just for a test.
 
Old 11-15-2012, 07:03 PM   #6
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
Hello jefro,

the distro is Debian (6). I am sorry for not being clear about this: I am not forcing the USB1.1 support via the linux (neither do i plan to enforce uhci drivers and the like). The Chipset configuration (BIOS )has more or less detailed configuration options concerning usb. It can enable/disable usb support, enable usb 2.0 support and has an option for choosing between full and highspeed usb 2.0 (12MB/s, 480MB/s respectively). This distinction between Full and HighSpeed is something I have only seen implemented on VIA Chipsets so far, so I am not really sure what to make of it. The system was using HighSpeed until now.

I do not believe that the linux and the associated IO/FS modules have a problem. The kernel in question is 2.6.32-5 by the way, but I agree in the fact that this is a hardware or chipset issue. I have for not taken the following actions:
(1) restore a sound backup of partition tables, bootloader and partition contents on the usb flash drive, including a newly created ext3 fs.
(2) I have removed all swap partitions from fstab and blkid.
(3) I have not initialized the swap partition at all. My OpenSUSE 11.3 at least was unable to use it as swap, so attempts by debian to enable that partition as swap should fail as well.
(4) I have have disabled USB1.1 support in the BIOS altogether.

So far I have not seen any of the related error messages on the running system. The uptime is however just above 6 hours. I have written a script that writes a 100MB large file every couple of hours on the flash drive and then deletes it again, just to produce some heavy IO activity. As I said, no problem so far.

If by tomorrow there are no filesystem error, I will test USB2.0 using FullSpeed.

As I said I suspect the USB2.0 BIOS settings to be the problem by now. Thank you jeffro for the value hint concerning the chipset.

Kind regards,

ichrispa
 
Old 11-15-2012, 07:26 PM   #7
jefro
Guru
 
Registered: Mar 2008
Posts: 11,105

Rep: Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362
I have seen many bios's with the choice to enable 2.0. It does seem to affect some odd programs in windows, dunno why. The slower usb speed will take forever to load and run. It may solve timings issues.

Never hurts to run memtest for a few days.
 
Old 11-16-2012, 11:56 AM   #8
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
Hello again jeffro and RockDoctor,

after having the system running for 14 hours without any hickups, I figured to test RockDoctors theory and turned the swap back on. 90 Minutes later I got an ext3fs error again:

Code:
[54348.505319] Adding 1424192k swap on /dev/sde3.  Priority:-1 extents:1 across:1424192k 
[59316.857018] EXT3-fs error (device sde2): ext3_lookup: deleted inode referenced: 332972
[59316.857059] Aborting journal on device sde2.
[59316.860106] ext3_abort called.
[59316.860124] EXT3-fs error (device sde2): ext3_journal_start_sb: Detected aborted journal
[59316.860155] Remounting filesystem read-only
So 14 hours of runtime should propably be attributed to both the fullspeed usb mode and the disabled swap.

And obviously the swap is related to the problem... But I have to admit that I do not know why... RockDoctor addressed this issue pretty straightforward. Is there a history of swap on flash drives corrupting neighboring partitions?
 
Old 11-16-2012, 05:44 PM   #9
jefro
Guru
 
Registered: Mar 2008
Posts: 11,105

Rep: Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362
Swap is for the most part like any disk access. One some systems it could consume a lot of time, processor and I/O resources.
 
Old 11-16-2012, 06:09 PM   #10
RockDoctor
Senior Member
 
Registered: Nov 2003
Location: Minnesota, US
Distribution: Fedora, Ubuntu
Posts: 1,210

Rep: Reputation: 238Reputation: 238Reputation: 238
I've not known swap to corrupt another filesystem. I was just running out of ideas. I have a tendency to do full installs on my fastest 4GB flash drive. Without a swap, I actually have room for some personal files, so I just run without a swap partition or file. In my case, running without a swap partition doesn't seem to slow things down significantly - obviously, YMMV.

My only problem with data corruption on flash drives is with persistence files when using live CD images, and it's only been the persistence file that gets clobbered.
 
Old 11-18-2012, 10:36 AM   #11
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
I believe I might have a reasonable explanation for the problem. Obviously there is no point in blaming linux or the fs modules for the corruption. I believe both have been well tested and exist for way too long to have such a major bug. I could blame the VIA Chipset, but the principle of enabling swap and having the adjacents partition fs fail cannot really be explained that way.

So I started looking at what is using the swap at the given time. It's actually a bit painstaking, given that the system fails short after that error, but I traced the problem to the VMWare Server. The corruption occurs when the Hypervisor attempts to move running virtual machines to swap memory. In particular the error occurs once the hostd process begins allocating swap memory (seen in /proc/<pid>/smaps).

This makes somewhat more sense, as vmware does use lowlevel access to manage memory allocation. An error in that kernel module would explain the catastrophic effects of the filesystem error. It also explains why the partition sde2 keeps getting corrupted worse and worse over time, even when sde2 is auto remounted as read only, as the kernel module handling the fs access is practically not in charge of the errorous IO requests anymore.

Though turning of swapping machine memory is an option, there are two reasons I do not want to go that way:
- For one, I only have 2GB of RAM. I do want the machine being most active to be able to allocate physical memory and not confine the allocation of the host system.
- I don't really feel comfortable using linux without a swap memory.

I will try the following workaround an see if it fixes things:
I will reformat the sde3 partition to ext3 and ceate a 1GB image file. I will mount that file using a loop device as swap. Since it is not a real device, it should not be affected by a process trying to access it beyond it's "physical" extends.
 
Old 11-20-2012, 04:59 PM   #12
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
Nope. That did not work either.

Though there were no explicit fs warnings this time, a reboot showed the system to be utterly shot. It went as far as fsck not being able to recover the partition at all.

I am completely out of ideas.
 
Old 11-21-2012, 02:42 PM   #13
jefro
Guru
 
Registered: Mar 2008
Posts: 11,105

Rep: Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362Reputation: 1362
Get a new system. Too many people have a usb running in native linux. I have had a few usb's that stunk but not one system that stunk. There are only so many things to test. I assume you have checked the md5/shal of this iso.
 
Old 11-21-2012, 03:44 PM   #14
ichrispa
Member
 
Registered: Mar 2005
Location: Dresden, Germany
Distribution: OpenSuse 11.2/3, Debian 5.0 , Debian 1.3.1, OpenBSD
Posts: 270

Original Poster
Rep: Reputation: 31
It's not the system either. Debian is ok, the installation was network based and the installer iso was verified.

I'm giving up on the stick and moving the installation onto the raid array. The stick will be used for booting and backups.

I'm flagging this thread as closed/solved. Here's a quick recap in case anyone should stumble in here:

What?
USB Flash drive based Debian 6.0.6 installation using 3 partitions (boot/ext3,root/ext3,swap) kept corrupting the filesystem of the root partition after/during large IO operations. The stick was mainly used to run a VMWare Server Hypervisor. After running for 3 hours, the filesystem would be completely unusable (not restorable using fsck). No unreadable sectors could be found on the usb flash drive and it passed all read/write tests.

Tried:
Changing VIA Chipset Parameters for USB Host (USB1.1, USB2 Full/HighSpeed);
mounting ext3 using nobarrier and sync;
disabling vmware hypervisor and associated kernel modules;
disabling and loop containing swap partition;

Solution:
None found. Workaround;
Debian Root partition was simply copied from a healthy backup to a free/new 6G raid partition and the uuid's of the root partition were changed in the bootloader and fstab. The USB Drive now servers as boot partition for extlinux. No further problems detected after that.


Thank you again for your help jefro and RockDoctor.

Last edited by ichrispa; 11-21-2012 at 03:46 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
USB drive filesystem corrupted rock.hopper Linux - General 7 10-19-2011 02:41 PM
Need to know how to copy a root file from non corrupted filesystem to another amsweitzer General 4 04-20-2011 10:42 PM
Root filesystem corrupted. manojugusthy Linux - Enterprise 3 11-29-2010 08:02 AM
USB device as root filesystem greeklegend Linux - Hardware 5 12-17-2006 01:50 AM
usb root filesystem and devfs sam cadby Linux From Scratch 1 10-14-2003 04:21 PM


All times are GMT -5. The time now is 11:07 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration