Random drives corrupt GTP table

gimpy530 · 01-30-2011, 04:40 PM

I have several drives in an LVM VG/LV and for some reason on reboot, a drive will get a corrupt GTP table. I have killed the entire VG and re-created it without the drive that was showing the problem, then then it just happens to another drive. It does not appear to be the same drive each time either. I've confirmed this by using smartctl to check the SN of the drive reporting a corrupted table. It's not always the same drive.

I have swapped around cables to the two controllers to see if I could pin-point which cable or port showed the problem and long story short, there was little consistency in it. This simply does not appear to be caused by any single cable, port, controller, or drive.

Code:

parted /dev/sdb print
Error: The primary GPT table is corrupt, but the backup appears OK, so that will be used.
OK/Cancel?

When I see that and select Ok, it just shows it again. I can do an mklabel and mkpart, then the LVM LV shows up under /dev as it should, without another vgscan. If I then mount that LV, I can see the data is there and it seems Ok despite the warning of mklabel saying it will destroy the data.

Logs show no cause during boot.

So, what is causing this? Will doing the mklabel kill the data on it?

I just don't understand why Ubuntu is randomly corrupting GTP tables.

Code:

Ubuntu 10.10 x64
Mobo: ASUS A8N-SLI - On board NVIDIA nforce4-SLI controller has 4 ports connected to 3 drives in this LVM LV.
HighPoint Technologies, Inc. RocketRAID 230x 4 Port SATA-II Controller - Has 4 ports, 3 of which are used in the LVM LV. (Had 4, one is out with an RMA).
Linux teal 2.6.35-22-server #34-Ubuntu SMP Sun Oct 10 10:54:55 UTC 2010 x86_64 GNU/Linux
  --- Volume group ---
  VG Name               vg-backup
  System ID             
  Format                lvm2
  Metadata Areas        6
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                6
  Act PV                6
  VG Size               3.64 TiB
  PE Size               4.00 MiB
  Total PE              953868
  Alloc PE / Size       953868 / 3.64 TiB
  Free  PE / Size       0 / 0   
  VG UUID               iKFodI-VcUI-Aikr-N1v2-V6Fq-fXFX-6hhXmD

  --- Logical volume ---
  LV Name                /dev/vg-backup/lv-backup
  VG Name                vg-backup
  LV UUID                yxDOVK-ep0Z-ODBT-LjdR-fQcS-72x8-Qu0fcI
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                3.64 TiB
  Current LE             953868
  Segments               6
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           251:4

04:00.0 SCSI storage controller: HighPoint Technologies, Inc. RocketRAID 230x 4 Port SATA-II Controller (rev 02)
	Subsystem: Marvell Technology Group Ltd. Device 11ab
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at d5000000 (64-bit, non-prefetchable) [size=1M]
	Region 2: I/O ports at a000 [size=256]
	[virtual] Expansion ROM at d6200000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [60] Express (v1) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <256ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #3, Speed 2.5GT/s, Width x4, ASPM L0s, Latency L0 <256ns, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 128 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		AERCap:	First Error Pointer: 14, GenCap- CGenEn- ChkCap- ChkEn-
	Kernel driver in use: sata_mv
	Kernel modules: sata_mv

xeleema · 01-30-2011, 05:47 PM

Greetingz!

I've had some trouble with certain HighPoint products in the past, so I've switched all of my systems over to software-based RAID (mdadm is awesome).
A few questions first;
a) You have a RAID configured by the HighPoint controller?
b) Are you booting off of that RAID array?
c) What Kernel version are you using?

Now for some of my "lessons learned" with HighPoint;
a) The manual's have some whacky stuff in there.
Here's yours.
b) Do you have (Oh dammit, I butchered my post)

gimpy530 · 01-30-2011, 06:12 PM

Quote:

Originally Posted by xeleema

a) You have a RAID configured by the HighPoint controller?
b) Are you booting off of that RAID array?
c) What Kernel version are you using?

a) Nope, just bare drives.
b) Nope, just a single drive off the motherboard.
c) That's in my first post.

Quote:

Originally Posted by xeleema

Now for some of my "lessons learned" with HighPoint;
a) The manual's have some whacky stuff in there.
Here's yours.
b) Do you have

What now?

xeleema · 01-30-2011, 06:41 PM

Okay, here's what I meant to say;
b) Do you have "Staggered drive spin up" enabled?
I had problems with this, but I'm also using four 1TB 7200 RPM drives. By the time the fourth drive would spin-up, something kept timing-out and throwing a bus reset.
c) Is "EBDA Reallocation" disabled?
I have a 2304 card, and this was disabled by default. Not only were there boot problems, but there were several issues with waking up the drives.
d) Have you checked the SMART status of your disks?
smartctl -H /dev/sd#

minrich · 01-30-2011, 06:44 PM

I don't know if this helps, but I had to look up GTP -(GUID Partition Table) on Wikipedia, not having moved to Grub2 myself, and not have >2.2TB's of hard drives, but it appears to me that the GPT header is now stored at LBA 1,(rather than the old MBR being stored at LBA 0.

The Wikipedia article then goes on to say that most recent disks have 4096-byte sectors (whether or not they report 512-byte sectors) and as I understand it, if the GTP doesn't start at the right place you get corruption. In which event the GPT header (backup) stored at the end of the disk (no, I don't know which one) and this is still good.

The article also mentions that the disk 'data' (LVs etc.) should not start until LBA 40.

Hope this helps, although I don't profess to know how to ensure exactly where the GPT header gets loaded.

gimpy530 · 01-30-2011, 07:55 PM

Quote:

Originally Posted by xeleema

Okay, here's what I meant to say;
b) Do you have "Staggered drive spin up" enabled?
I had problems with this, but I'm also using four 1TB 7200 RPM drives. By the time the fourth drive would spin-up, something kept timing-out and throwing a bus reset.
c) Is "EBDA Reallocation" disabled?
I have a 2304 card, and this was disabled by default. Not only were there boot problems, but there were several issues with waking up the drives.
d) Have you checked the SMART status of your disks?
smartctl -H /dev/sd#

b) Nope.
c) I don't see anything which mentions this in the BIOS config. The manual mentions it, but only briefly in the Windows section.
d) Yep, I've ran a long test on all of them and they all come back clean. I have already ran extended tests from the drive manufac's boot media. All is reporting good.

xeleema · 01-30-2011, 09:38 PM

Really? Okay, at this point I almost want to suggest a kernel upgrade, or maybe checking the modules that are loading.....

gimpy530 · 01-30-2011, 10:39 PM

Quote:

Originally Posted by xeleema

Really? Okay, at this point I almost want to suggest a kernel upgrade, or maybe checking the modules that are loading.....

Updated to the latest Ubuntu 10.10 kernel, still fails.

I guess my other option would be dropping in the mainline kernel and see what happens?

xeleema · 01-31-2011, 03:34 AM

One last option (if you have a backup of the data on that disk). You mentioned in your first post that you've thrown around a few disks, and keep getting the GPT errors on different disks. I also noticed that you have a VG (vg-backup) with six PVs in it.

a) Is vg-backup basically concatenating, or are you doing anything for redundancy?
b) Do you have a valid backup of this system (or is this system basically the backup-server)?
c) can you trash the whole vg-backup (all six PVs)? (Basically, is there anything "worth it" in that VG?
d) You're not booting the OS from any of those PVs, right?
e) Are all of those PVs "whole-disk", or are you using other partitions for something else? (like the OS?)
f) Have you considered using mdadm to setup mirroring? (RAID10?)

The reason I ask is this;
1) if you don't have any redundancy, that should be addressed (six PVs is a *lot* of drives that could fail)
2) Something may be fishy with the way the VG was setup. Specifically if those disks were used for something prior to this.
3) Perhaps "dd'ing" the first 1MB of the disks would clear up any crazyness left-over on the disks from their previous life. However this would destroy data (hence the backup questions).

gimpy530 · 01-31-2011, 06:31 AM

Quote:

Originally Posted by xeleema

One last option (if you have a backup of the data on that disk). You mentioned in your first post that you've thrown around a few disks, and keep getting the GPT errors on different disks. I also noticed that you have a VG (vg-backup) with six PVs in it.

a) Is vg-backup basically concatenating, or are you doing anything for redundancy?
b) Do you have a valid backup of this system (or is this system basically the backup-server)?
c) can you trash the whole vg-backup (all six PVs)? (Basically, is there anything "worth it" in that VG?
d) You're not booting the OS from any of those PVs, right?
e) Are all of those PVs "whole-disk", or are you using other partitions for something else? (like the OS?)
f) Have you considered using mdadm to setup mirroring? (RAID10?)

a) No redundancy on this box.
b) This is the backup.
c) Already trashed it several times.
d) Nope.
e) All are whole disk.
f) I did, but the drives are not all the same size and there is no decent way of doing it with my setup. I am aware of the danger, but I don't want to drop the money on a proper setup. This is an old desktop with a bunch of hard drives in it, nothing more.

Quote:

Originally Posted by xeleema

The reason I ask is this;
1) if you don't have any redundancy, that should be addressed (six PVs is a *lot* of drives that could fail)
2) Something may be fishy with the way the VG was setup. Specifically if those disks were used for something prior to this.
3) Perhaps "dd'ing" the first 1MB of the disks would clear up any crazyness left-over on the disks from their previous life. However this would destroy data (hence the backup questions).

a) I am aware of this and accept it.
b) I doubt it. In my testing I've been doing: lvremove, vgremove, pvremove {eachdrive}, pvcreate {eachdrive}, vgcreate, lvcreate, mkfs.
c) I'll try it later, but I have doubts that will solve the problem.

gimpy530 · 01-31-2011, 09:58 AM

Quote:

Originally Posted by xeleema

3) Perhaps "dd'ing" the first 1MB of the disks would clear up any crazyness left-over on the disks from their previous life. However this would destroy data (hence the backup questions).

I did a dd on the first 10 MB of the drives, verified the table was gone on all of them, rebooted, re-made the table and partition on each one, verified they showed up (parted /dev/sda print), rebooted, and yet again the table was lost on one of them.

Note that this was before LVM so that completely rules it out (I didn't suspect it earlier though).

So, something is still causing the table to get corrupted. Replacing my kernel with the mainline one is the only other idea I have, and I've never done it before so it may take a while and I'll have to read through some documentation first.

xeleema · 01-31-2011, 10:54 AM

Quote:

Originally Posted by gimpy530

I did a dd on the first 10 MB of the drives, verified the table was gone on all of them, rebooted, re-made the table and partition on each one, verified they showed up (parted /dev/sda print), rebooted, and yet again the table was lost on one of them.

This is the Twighlight Zone of errors if I've ever seen one.

Quote:

Originally Posted by gimpy530

Note that this was before LVM so that completely rules it out (I didn't suspect it earlier though).

Good to know. I didn't suspect it either, but it's good you covered that base (just in case).

Quote:

Originally Posted by gimpy530

So, something is still causing the table to get corrupted. Replacing my kernel with the mainline one is the only other idea I have, and I've never done it before so it may take a while and I'll have to read through some documentation first.

Okay, before you do that, see if you have a /proc/config.gz file. If you do, that's a compressed copy of the running kernel's configuration. This will save you a lot of guess-work about what needs to be a module vs compiled-in.

I'm going to pull-down Ubuntu 10.10 x64, set it up as a VM, and see if I can reproduce things on my end. I have 8 x 8GB USB sticks and a 16-port USB 2.0 HUB I can attach to the VM and experiment with....

By the way, are you using "Desktop" or "Server"?

Update #1: Ubuntu 10.10 x64 "Desktop" finished downloading. I've configured a VM, just have to install the OS.
Some more googling around has made me notice something...everyone with a GPT error has used "parted" to partition their drives rather than "fdisk", and no one seems to use mkfs.ext2. I wonder if that's a RedHat-ism or an Ubuntu-ism...
(Apologies for the low res screencap. Trying to stay under my LQ limit.)

Update #2: Installing the OS now, just wanted to note my partition layout (incase it's relevent).

Update #3: OS is installed. Adding kernel sources for 2.6.35 & kernel tools, too. (Have to have my VMware Tools working...)

Update #4: Ready to do the test. A little pre-show diagnostic info;

Code:

luser@lhost:~$ cat /etc/lsb-release ; uname -a ; sudo fdisk -l /dev/sd[a-z]|grep -i dev| grep .
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=10.10
DISTRIB_CODENAME=maverick
DISTRIB_DESCRIPTION="Ubuntu 10.10"

Linux lhost 2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linux

Disk /dev/sda: 10.7 GB, 10737418240 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         125      999424   82  Linux swap / Solaris
/dev/sda2   *         125         187      499712   83  Linux
/dev/sda3             187        1306     8984576   83  Linux
luser@lhost:~$

Update #5: 8 x 8GB USB sticks attached to host system...blowing-away the first 5MB everything on each stick to destroy any existing filesystems (Damn I hate U3 disks, craziest partitioning scheme I've ever seen).

Update #6: Only 6 of the 8GB USB sticks want to place nice-nice. On with the show!

xeleema · 01-31-2011, 05:02 PM

Alrighty!

So I've setup 6 x 8GB USB sticks. I "dd'd" all of the sticks, then used fdisk to create one big parititon on each of them.

Code:

luser@lhost:~$ sudo fdisk -l /dev/sd[b-g] 2>/dev/null| grep dev
Disk /dev/sdb: 8029 MB, 8029470208 bytes
/dev/sdb1               1        1019     7834041   83  Linux
Disk /dev/sdc: 8029 MB, 8029470208 bytes
/dev/sdc1               1        1019     7834041   83  Linux
Disk /dev/sdd: 8029 MB, 8029470208 bytes
/dev/sdd1               1        1019     7834041   83  Linux
Disk /dev/sde: 8040 MB, 8040480256 bytes
/dev/sde1               1        1021     7849417   83  Linux
Disk /dev/sdf: 8029 MB, 8029470208 bytes
/dev/sdf1               1        1019     7834041   83  Linux
Disk /dev/sdg: 8040 MB, 8040480256 bytes
/dev/sdg1               1        1021     7849417   83  Linux
luser@lhost:~$

One of the things I want to make sure is that you're not using an EFI disk label, it should be MS-DOS;

Code:

luser@lhost:~$ sudo parted /dev/sdb print
Model: SanDisk Cruzer (scsi)
Disk /dev/sdb: 8029MB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system  Flags
 1      31.7kB  8022MB  8022MB  primary

luser@lhost:~$

If you are using an EFI disk label, you're going to have to nuke the beginning AND end of the drive.
This can be done via "dd";
The Beginning

Code:

dd if=/dev/urandom of=/dev/sdb count=2 bs=512

The End (for me. Check your disk geometrics for the total sector count).

Code:

dd if=/dev/urandom of=/dev/sdb seek=7834040 bs=512

They've all been PV'd, and VG "vgusb" has been created.

Code:

luser@lhost:~# sudo vgdisplay /dev/vgusb
  --- Volume group ---
  VG Name               vgusb
  System ID             
  Format                lvm2
  Metadata Areas        6
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                6
  Act PV                6
  VG Size               44.84 GiB
  PE Size               4.00 MiB
  Total PE              11480
  Alloc PE / Size       0 / 0   
  Free  PE / Size       11480 / 44.84 GiB
  VG UUID               zjo6Iu-n78i-ym4T-p73g-Ocf2-Y3CK-sLgKid
   
luser@lhost:~# sudo vgdisplay -v /dev/vgusb|grep "PV Name"
    Using volume group(s) on command line
    Finding volume group "vgusb"
  PV Name               /dev/sdb1     
  PV Name               /dev/sdc1     
  PV Name               /dev/sdd1     
  PV Name               /dev/sde1     
  PV Name               /dev/sdf1     
  PV Name               /dev/sdg1     
luser@lhost:~#

I've also created "lvusb", one big fat logical volume (without any sort of striping or mirroring going on).
(I did enable "mount on reboot", too.)

gimpy530 · 02-02-2011, 02:31 PM

You are far less lazy than me in doing all of that.

I was using GPT (hence the title of this thread) but I moved to msdos on all of these drives and I have not been able to replicate the problem. So, how GPT stores its tables is part of the issue.

I could create a VM which matches the config of the physical machine and give it 8 or so virtual drives to try to emulate the problem on virtual hardware which would (mostly) confirm if the kernel itself is causing the problem.

Other random ideas I had:
The controller itself could be over-writing the table of a drive when it initializes. I have not seen any information to support this, but it seems the most likely. The strange part is it is not always the same port which has the problem, maybe it is just whatever the first drive which is found during the scanning gets corrupted?

The kernel module in use (sata_mv) could not not playing nicely. It could be not writing the information properly in the first place, which I could discover by looking at the raw data at the beginning of the disk...after researching what it should look like. Given that I can see the table after I create it before a reboot and other drives are fine, I doubt this is the problem.

The kernel itself could be doing the above.

Within a year I will have to consider redesigning my storage layout of both my servers, at which point I may have to ditch this controller and go with a much higher-end one.

xeleema · 02-02-2011, 03:29 PM

I've been able to replicate the problem by using GPT EFI partitioning on the USB sticks, and *not* dd'ing the head and tail of each device!
You'll need to nuke the drives as I mentioned in my previous post, then use "fdisk" to create the partitions (I have a newfound hatred for parted).