LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Troubleshooting large partition "attempt to access beyond end of device" (https://www.linuxquestions.org/questions/linux-server-73/troubleshooting-large-partition-attempt-to-access-beyond-end-of-device-676617/)

dear mr. trout 10-15-2008 04:53 PM

Troubleshooting large partition "attempt to access beyond end of device"
 
I apologize if this seems long, but I wanted to show exactly what happened and what I have done. In some cases, I might have unwittingly exacerbated the problem, so please let me know if I might have made the problem worse.

In short, I have a 7.5TB partition with data I cannot access, with the infamous "attempt to access beyond end of device" kernel messag.

Here's what happened:

There is a server with 24 drives. 12 drives were used to make up a 7.5TB RAID-5, and other twelve drives were not raided yet and had valuable data. I created a partition on it using parted that shipped with Ubuntu 8.04 Hardy, put EXT3 using mkfs.ext3 and copied the data over.

So /dev/hdb1 was the single 7.5TB partition with a copy of the data. (I could have put a filesystem on /dev/hdb, but I am acting consistently with other setups on many servers.) After copying data to this partition, I checked it was all there and correct size (using diff). I then wiped out the non-raided drives and raided them. (The problems I am having are with the first raided drive, not the second, so I believe we can forget about the latter raid drive.)

After restarting the server, mounted the first 7.5TB drive, which is having problms. Here's the entry from df -h:


--------------------------START------------------------------
...
/dev/sdb1 7.4T 2.1T 5.0T 29% /media/sdb1
---------------------------END-------------------------------


Mounts very quickly, looks good. But here's what happens when ls the directory:


--------------------------START------------------------------
ls: cannot access sdb1/dir3: Input/output error
ls: cannot access sdb1/dir1: Input/output error
ls: cannot access sdb1/dir2: Input/output error
lost+found dir1 dir2 dir3
---------------------------END-------------------------------


And output from dmesg | tail:


--------------------------START------------------------------
SELinux: initialized (dev sdb1, type ext3), uses xattr
attempt to access beyond end of device
sdb1: rw=32, want=4195876888, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=262242305, block=524484610
attempt to access beyond end of device
sdb1: rw=32, want=4218945560, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=263684097, block=527368194
attempt to access beyond end of device
sdb1: rw=32, want=12908757016, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=806797313, block=1613594626
---------------------------END-------------------------------


I can cd into the mounted partition and access lost+found and its contents directly, but not th other three directories.

From here on, I will show the three steps I took, in chronological order. (I repeated some steps, as I was muddling my way towards a better understanding.)
(1) Run fsck
(2) Try to resize partition using parted
(3) Try to temporarily remove some filesystem features so I can use parted


(STEP 1) I decide to run Run fsck /dev/sdb1:


--------------------------START------------------------------
SELinux: initialized (dev sdb1, type ext3), uses xattr
attempt to access beyond end of device
sdb1: rw=32, want=4195876888, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=262242305, block=524484610
attempt to access beyond end of device
sdb1: rw=32, want=4218945560, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=263684097, block=527368194
attempt to access beyond end of device
sdb1: rw=32, want=12908757016, limit=3228132399
EXT3-fs error (device sdb1): ext3_get_inode_loc: unable to read inode block - inode=806797313, block=1613594626
[root@localhost media]# umount /dev/sdb1
[root@localhost media]# fsck /dev/sdb1
fsck 1.40.8 (13-Mar-2008)
e2fsck 1.40.8 (13-Mar-2008)
The filesystem size (according to the superblock) is 2014129285 blocks
The physical size of the device is 403516549 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? no

/dev/sdb1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Error reading block 403537922 (Invalid argument) while getting next inode from scan. Ignore error<y>? yes

Force rewrite<y>? no

Error reading block 403537923 (Invalid argument) while getting next inode from scan. Ignore error<y>? no

Error while scanning inodes (201768960): Can't read next inode
e2fsck: aborted

---------------------------END-------------------------------


The above continues for every consecutive value following 403537922, i.e., a very long time. Cannot (and wonder if should not anyhow) use -y option because asks about beginning: " Abort<y>?".

(In the past, I input 'y' to Ignore/force rewrite, but was taking entirely too long.)

I also tried repairing with several alternate superblocks, but didn't change anything.

I got ahead of myself and rationalized that I could expand the partition size and that (hopefully) the data might reside on the disk.

(STEP 2) So my parted session:


--------------------------START------------------------------
> parted /dev/sdb
GNU Parted 1.8.8
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: AMCC 9550SX-12M DISK (scsi)
Disk /dev/sdb: 8250GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 32.3kB 1653GB 1653GB primary ext3

(parted) resize 1 32.3kB 8250GB
Error: The file system is bigger than its volume!
Ignore/Cancel? Ignore
Warning: File system has errors! You should run e2fsck.
Ignore/Cancel? Ignore
Error: File system has an incompatible feature enabled.
(parted)
---------------------------END-------------------------------


Note that this puts the size at ~1.6TB, not 7.4TB reported by df.

(STEP 3) I wanted to try to figure out what features were incompatible. I searched around online and then I ran tune2fs -l /dev/sdb1:


--------------------------START------------------------------
tune2fs 1.40.8 (13-Mar-2008)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: eaf4f430-1380-4125-9498-65bbc94a4d07
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal resize_inode dir_index filetype sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean with errors
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1007075328
Block count: 2014129285
Reserved block count: 100706464
Free blocks: 1441093558
Free inodes: 1007009890
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 543
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Filesystem created: Thu Sep 18 18:41:54 2008
Last mount time: Wed Oct 15 17:04:44 2008
Last write time: Wed Oct 15 17:10:06 2008
Mount count: 15
Maximum mount count: 26
Last checked: Thu Sep 18 18:41:54 2008
Check interval: 15552000 (6 months)
Next check after: Tue Mar 17 18:41:54 2009
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: c4650f98-ce1d-4305-ae22-36cbb05a095a
Journal backup: inode blocks
---------------------------END-------------------------------


So I try to temporarily flag off some features I read about on various forums as possibly being the culprit:


--------------------------START------------------------------
> tune2fs -O ^resize_inode /dev/sdb1
tune2fs 1.40.8 (13-Mar-2008)

Please run e2fsck on the filesystem.

> tune2fs -O ^dir_index /dev/sdb1
tune2fs 1.40.8 (13-Mar-2008)
> tune2fs -l /dev/sdb1

(Same output as before, i.e, features were not disabled.)
---------------------------END-------------------------------


This is where I am stuck: I cannot repair using fsck/e2fsck, and any attempt to resize the partition (which has an underlying assumption that it might help, which could be wrong) fails.

My questions:

1. Am I missing anything obvious? I have been searching through forums for solutions, but this is a report of all the knowledge I have gained in the process.

2. If I cannot solve this myself, should I consider a data recovery service? It would be a recommendation, as the server is not mine (I'm helping to figure out the problem as part of a collaboration). I am part of publicly-funded group, so you can imagine that in our economy money is tight and highly regulated.

I really appreciate any wisdom or insights. This is far from my primary work responsibility, but there is no one with the clearly-defined role of caring for this particular server.

I really want to step up and resolve the issue, and learn a little more about what happened in the process. I cannot thank you enough for any feedback.

dear mr. trout 10-16-2008 10:22 AM

Perhaps I should ask if anyone has had a similar problem:
* Create a partition (large or small, used parted), put EXT3 on it and mount it
* Copy data over, verify it
* After restart (or somewhere down the road), mounted partition says it is 7.4TB but is actually ~1.6TB according to parted

In other words, the partition is not the correct size.

If not, does anyone believe that resizing the partition will help? If so, perhaps there is something I am missing something. If resizing the partition doesn't seem like it would help, then I'm barking up the wrong tree.

Lastly, does anyone think that a professional service would help? I'd love to solve it with any help I can find online.

I really appreciate any help.

twfey 10-20-2008 05:16 PM

Found your post in my own search for a solution to the same problem ("attempt to access beyond end of device"). The solution that worked for us may work for you and so I thought it was worth posting a reply. YMMV

It appears from your "parted" output you used "msdos" disk label instead of the "gpt" disk label and I suspect you may not have the "CONFIG_EFI_PARTITION" option set in your kernel either.

Once we configured and recompiled our kernel with the CONFIG_EFI_PARTITION option set (Slackware 12.0 Kernel vers. 2.6.21.5), re-labeled our 3TB partition from "msdos" to "gpt" using parted, and then re-made the filesystem (ext3) all was well.

Caution: You'll loose your data when you re-label the disk using parted.

Check out these links:

http://www.cyberciti.biz/tips/fdisk-...eater-2tb.html

http://en.wikipedia.org/wiki/GUID_Partition_Table

Cheers,

Troy

dear mr. trout 10-20-2008 07:12 PM

From http://www.cyberciti.biz/tips/fdisk-...eater-2tb.html:

Quote:

If you don't include GPT support in Linux kernelt, after rebooting the server, the file system will no longer be mountable or the GPT table will get corrupted. By default Redhat Enterprise Linux / CentOS comes with GPT kernel support. However, if you are using Debian or Ubuntu Linux, you need to recompile the kernel.
twfey: Thank you for your reply, which proved helpful by getting me on the right track to a different solution. (I was in a rut of sorts, and seeing the problem from a different angle suggested I try a different solution.)

I looked up more on msdos/gpt disk labels (I didn't even realize that was a reference to the partition table!), and saw that the msdos disk label cannot be more than 2TB. I knew that 2TB was a limit of mkfs,

So I looked into data recovery tools, and discovered TestDisk (GPL, open source). I installed it through Yum, and it is copying over the files right now. My co-workers are going to be stoked!

If anyone has a similar problem and needs to recover the data, I suggest you look into testdisk to see whether the data can be copied off the disk.

After I copy the data over to my other 7.5TB mountable, I'm going to remove the partition and put a filesystem on the disk (i.e., /dev/sdb instead of /dev/sdb1), which worked for the other device (/dev/sda). Unless there's a reason to not do this.

Again, thank you twfey! I've learned a lot from this misadventure.

twfey 10-21-2008 09:02 AM

You're welcome. I'm happy to hear you resolved your problem.


All times are GMT -5. The time now is 11:40 PM.