LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   JFS on large LVM-volume (> 35TB) fails (http://www.linuxquestions.org/questions/linux-server-73/jfs-on-large-lvm-volume-35tb-fails-804077/)

murmur101 04-25-2010 09:27 AM

JFS on large LVM-volume (> 35TB) fails
 
Hi all,

I have been struggling with this one for the whole weekend and could need some help.

I am running Debian Lenny 2.6.26-2-amd64
and I have the following setup:

3 Raid6-Volumes each 19TB size (md0,md1,md2)

I would like to join the three of them to one large LV (of 57TB) formatted with a jfs fs.

I installed:
jfs_mkfs version 1.1.14, 06-Apr-2009
lvm version
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.13.0


Prerequisite: the md are assembled and have synched:

Code:

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sda[0] sdo[14] sdm[13] sdn[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

md1 : active raid6 sdp[0] sdad[14] sdac[13] sdab[12] sdaa[11] sdz[10] sdy[9] sdx[8] sdw[7] sdv[6] sdu[5] sdt[4] sds[3] sdr[2] sdq[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

md2 : active raid6 sdae[0] sdas[14] sdar[13] sdaq[12] sdap[11] sdao[10] sdan[9] sdam[8] sdal[7] sdak[6] sdaj[5] sdai[4] sdah[3] sdag[2] sdaf[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

I used jfs_mkfs on each of the three mds and mounted the drives to run some checks and they worked without a problem. I am pretty sure the drives/raids are OK.

The details about one raid:

Code:

mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Fri Apr 23 12:52:16 2010
    Raid Level : raid6
    Array Size : 19046800448 (18164.44 GiB 19503.92 GB)
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
  Raid Devices : 15
  Total Devices : 15
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Apr 25 15:57:37 2010
          State : clean
 Active Devices : 15
Working Devices : 15
 Failed Devices : 0
  Spare Devices : 0

    Chunk Size : 16K

          UUID : 8e0040fe:fb148c89:4d482edb:0dad0e58
        Events : 0.8

I then created the PVs
Code:

pvscan
  PV /dev/md0  VG pod  lvm2 [17.74 TB / 0    free]
  PV /dev/md1  VG pod  lvm2 [17.74 TB / 0    free]
  PV /dev/md2  VG pod  lvm2 [17.74 TB / 13.22 TB free]

(there should be 19TB/drive but this is normal as lvm measures the size differently)

Then I created the VG "pod" with the three drives as members:
Code:

vgdisplay pod
  --- Volume group ---
  VG Name              pod
  System ID
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2
  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                1
  Open LV              0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size              53.22 TB
  PE Size              4.00 MB
  Total PE              13950291
  Alloc PE / Size      10485760 / 40.00 TB
  Free  PE / Size      3464531 / 13.22 TB
  VG UUID              3B0kpR-k9Pn-Xxeu-l0Qr-X4TM-Ha2k-LFNumI

then I created a 40TB LV for testing
Code:

lvcreate -L 40T pod
which gave me this
Code:

lvscan
  ACTIVE            '/dev/pod/lvol0' [40.00 TB] inherit

then I went on formatting the drive using
Code:

jfs_mkfs /dev/pod/lvol0
Still - everything fine.

when trying to check the partition with jfs_fsck
I get
Code:

ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.

The really funny thing is that I don't seem to get the error if I try to run the thing with a 25TB LV everything looks fine:

Code:

cryptpod1:/# lvresize -L 25T /dev/po
pod/  port
cryptpod1:/# lvresize -L 25T /dev/po
pod/  port
cryptpod1:/# lvresize -L 25T /dev/pod/lvol0
  WARNING: Reducing active logical volume to 25.00 TB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce lvol0? [y/n]: y
  Reducing logical volume lvol0 to 25.00 TB
  Logical volume lvol0 successfully resized
cryptpod1:/# jfs_mkfs /dev/pod/lvol0
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod/lvol0 will be lost!

Continue? (Y/N) Y
  |

Format completed successfully.

26843545600 kilobytes total disk space.
cryptpod1:/# jfs_fsck /dev/pod/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/25/2010 16.19.29
The current device is:  /dev/pod/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  6710886400
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
26843545600 kilobytes total disk space.
        0 kilobytes in 1 directories.
        0 kilobytes in 0 user files.
        0 kilobytes in extended attributes
  4230484 kilobytes reserved for system use.
26839315116 kilobytes are available for use.
Filesystem is clean.


but not with a 40TB LV:

Code:

lvresize -L 40T /dev/pod/lvol0
  Extending logical volume lvol0 to 40.00 TB
  Logical volume lvol0 successfully resized
cryptpod1:/# jfs_mkfs /dev/pod/lvol0
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod/lvol0 will be lost!

Continue? (Y/N) Y
  |

Format completed successfully.

42949672960 kilobytes total disk space.
cryptpod1:/# jfs_fsck /dev/pod/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/25/2010 16.21.33
The current device is:  /dev/pod/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  10737418240
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.

There is nothing in syslog that would point to the source of this error. Also - if I try to mount the 40TB partition, everything is fine - but when I try to write to it, the whole system hangs and there is no way to recover.

Thank you very much for your help...

M

smoker 04-25-2010 12:27 PM

Going by my lvm setup, which admittedly is not as huge as yours, there seems to be a discrepancy in the reported metadata sequence number.

I have 4 physical volumes, you have 3 (accounting for raid)

my vgdisplay output is

Code:

[root@kids ~]# vgdisplay my_movies_group
  --- Volume group ---
  VG Name              my_movies_group
  System ID
  Format                lvm2
  Metadata Areas        4
  Metadata Sequence No  4

  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                1
  Open LV              1
  Max PV                0
  Cur PV                4
  Act PV                4
  VG Size              1.55 TB
  PE Size              4.00 MB
  Total PE              405398
  Alloc PE / Size      405398 / 1.55 TB
  Free  PE / Size      0 / 0
  VG UUID              zsNLjr-7HKL-VFip-UWb7-8fLc-zOJF-xZ2zVl

Yours is
Code:

vgdisplay pod
  --- Volume group ---
  VG Name              pod
  System ID
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2

  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                1
  Open LV              0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size              53.22 TB
  PE Size              4.00 MB
  Total PE              13950291
  Alloc PE / Size      10485760 / 40.00 TB
  Free  PE / Size      3464531 / 13.22 TB
  VG UUID              3B0kpR-k9Pn-Xxeu-l0Qr-X4TM-Ha2k-LFNumI

This may be a cause of the problems.

murmur101 04-25-2010 01:39 PM

Quote:

Originally Posted by smoker (Post 3947251)
Going by my lvm setup, which admittedly is not as huge as yours, there seems to be a discrepancy in the reported metadata sequence number.

I have 4 physical volumes, you have 3 (accounting for raid)

my vgdisplay output is


This may be a cause of the problems.


Hi,

thank you for your help. It looked promising but I think the Metadata Sequence No is just increased by 1 every time I do an operation with the VG or LV. I added one drive to the VG this time and then extended twice and ended up with

Metadata Sequence No 4

- still the same problem.

I added the lvcfgbackup to this post:

Code:


cryptpod1:/etc/lvm/backup# cat pod
# Generated by LVM2 version 2.02.39 (2008-06-27): Sun Apr 25 20:52:52 2010

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgcfgbackup'"

creation_host = "cryptpod1"    # Linux cryptpod1 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 x86_64
creation_time = 1272221572      # Sun Apr 25 20:52:52 2010

pod {
        id = "fVbjR8-JvcA-9YtS-gkic-KBmW-31Ue-2wnYxw"
        seqno = 10
        status = ["RESIZEABLE", "READ", "WRITE"]
        extent_size = 8192              # 4 Megabytes
        max_lv = 0
        max_pv = 0

        physical_volumes {

                pv0 {
                        id = "8hWDiA-u7o8-ViLc-7nIs-9SRo-tnMe-OowR1t"
                        device = "/dev/md2"    # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }

                pv1 {
                        id = "PZeRTh-qtxW-u5k1-5CPJ-KSf0-i2fg-jK4oJl"
                        device = "/dev/md1"    # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }

                pv2 {
                        id = "20Cgl7-f6bW-fsgx-wOm1-skzW-4Skj-IRuIoO"
                        device = "/dev/md0"    # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }
        }

        logical_volumes {

                lvol0 {
                        id = "Mpe0zN-emfe-aReP-LQz7-sr9b-Rj2y-KFXUcw"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 3

                        segment1 {
                                start_extent = 0
                                extent_count = 4650097  # 17.7387 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 0
                                ]
                        }
                        segment2 {
                                start_extent = 4650097
                                extent_count = 4650097  # 17.7387 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv1", 0
                                ]
                        }
                        segment3 {
                                start_extent = 9300194
                                extent_count = 4648489  # 17.7326 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv2", 0
                                ]
                        }
                }
        }
}

M

smoker 04-25-2010 06:14 PM

I find it hard to follow your method, as you created filesystems before you created the actual logical volume. This shouldn't cause an issue, but I can't track it through what I did. All I ever did was assign disks to an LVM volume, then create a logical drive. After that, I formatted the LVM and all was fine. You appear to have done all that, but the order is not obvious, and you are having errors.

I would suggest starting again and doing it in a proper order, and assigning all the extents to the LVM, before making a filesystem on the LVM. I think the LVM is recording a drive as present, but the rest of the OS doesn't see it as accessable.

I did this from scratch, the OS had no part in the setup of LVM whatsoever.

Oh, BTW, I have added 2 drives to this LVM over time, but the metadata sequence area is still at 4. I started with 2 drives. It appears to increase with the number of drives.

murmur101 04-26-2010 01:44 AM

Hi Smoker,

thx for bearing with me.
I think I did it "by the book" but just to be sure I repeated the procedure this morning:

Code:

cryptpod1:~# pvremove /dev/md0
  Labels on physical volume "/dev/md0" successfully wiped
cryptpod1:~# pvremove /dev/md1
  Labels on physical volume "/dev/md1" successfully wiped
cryptpod1:~# pvremove /dev/md2
  Labels on physical volume "/dev/md2" successfully wiped
cryptpod1:~# dd if=/dev/zero of=/dev/mb0
dd: writing to `/dev/mb0': No space left on device
18505+0 records in
18504+0 records out
9474048 bytes (9.5 MB) copied, 0.0186135 s, 509 MB/s
cryptpod1:~# dd if=/dev/zero of=/dev/mb1
dd: writing to `/dev/mb1': No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000137518 s, 0.0 kB/s
cryptpod1:~# dd if=/dev/zero of=/dev/mb2
dd: writing to `/dev/mb2': No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000121664 s, 0.0 kB/s
cryptpod1:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created
cryptpod1:~# pvcreate /dev/md1
  Physical volume "/dev/md1" successfully created
cryptpod1:~# pvcreate /dev/md2
  Physical volume "/dev/md2" successfully created
cryptpod1:~# vgcreate pod0 /dev/md0 /dev/md1 /dev/md2
  Volume group "pod0" successfully created
cryptpod1:~# vgdisplay pod0
  --- Volume group ---
  VG Name              pod0
  System ID           
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  1

  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                0
  Open LV              0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size              53.22 TB
  PE Size              4.00 MB
  Total PE              13950291
  Alloc PE / Size      0 / 0 
  Free  PE / Size      13950291 / 53.22 TB
  VG UUID              pWE9eu-tQNc-o17V-BkFp-nag5-QTh6-nSIdrf
cryptpod1:~# lvcreate -L 40T pod0
  Logical volume "lvol0" created
cryptpod1:~# lvdisplay
  --- Logical volume ---
  LV Name                /dev/pod0/lvol0
  VG Name                pod0
  LV UUID                CBvB6d-kTRP-v7OJ-Mvzb-rh5b-dVIN-aMl4cL
  LV Write Access        read/write
  LV Status              available
  # open                0
  LV Size                40.00 TB
  Current LE            10485760
  Segments              3
  Allocation            inherit
  Read ahead sectors    auto
  - currently set to    256
  Block device          253:0
cryptpod1:~# vgdisplay pod0
  --- Volume group ---
  VG Name              pod0
  System ID           
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2

  VG Access            read/write
  VG Status            resizable
  MAX LV                0
  Cur LV                1
  Open LV              0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size              53.22 TB
  PE Size              4.00 MB
  Total PE              13950291
  Alloc PE / Size      10485760 / 40.00 TB
  Free  PE / Size      3464531 / 13.22 TB
  VG UUID              pWE9eu-tQNc-o17V-BkFp-nag5-QTh6-nSIdrf
cryptpod1:~# jfs_mkfs /dev/pod0/lvol0
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod0/lvol0 will be lost!

Continue? (Y/N) Y
  |

Format completed successfully.

42949672960 kilobytes total disk space.
cryptpod1:~# jfs_fsck /dev/pod0/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/26/2010 8.33.56
The current device is:  /dev/pod0/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  10737418240
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod0/lvol0.  CANNOT CONTINUE.

I was also able to identify the point where I get the error and it really seems to be when the volume or the jfs partition stretches on the third disk. I also tried to change the order of the disks (added /dev/md2 first and extended then) without results.
As you can see the count was increased after I created the LV. I will test this on another station to see if this is normal or related to the server.

Thx
M

murmur101 04-26-2010 03:17 AM

Update:

seems to be jfs-related somehow. The same LV formated with XFS came out OK (I can't run a check of the entire partition in XFS as it is an incredible memory-hog).

I will focus my search on jfs-related problems now and if I don't find anything I will fall back to XFS.

M

smoker 04-26-2010 05:17 AM

Quote:

Originally Posted by murmur101 (Post 3947831)
Update:

seems to be jfs-related somehow. The same LV formated with XFS came out OK (I can't run a check of the entire partition in XFS as it is an incredible memory-hog).

I will focus my search on jfs-related problems now and if I don't find anything I will fall back to XFS.

M

Interesting, there should be no problems with JFS. I can't say much about it as a filesystem, except that it allocates files as extents.
Whether that conflicts with LVM style extents or not might be an issue. However XFS does the same thing in a slightly different way. And that appears to be ok.

murmur101 04-27-2010 01:12 PM

Okay..

I do have another server with 100% the same hardware (yes I have more than 110TB in 2 servers and no - I don't plan to mirror all the porn of the internet)

Did a full run from scratch with only the versions from debians stable repos and.. lo and behold:

Code:

ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.

same thing. And with this box - no fooling around. It was a start-to-end config on a vanilla system. I will try to file a bug about this one as I think I have enough evidence (reproducible, works with other fs) that something is fishy about jfs.

Will come back to this thread if I find something.
Cheers and thank you

H_TeXMeX_H 04-27-2010 02:00 PM

Try a live CD with another distro, just to make sure it isn't Debian-specific (or the version of JFS utils it uses). If it is a bug, tell them about it.

murmur101 04-27-2010 02:01 PM

Seems to be a known issue with large volumes:
https://sourceforge.net/tracker/?fun...02&atid=712756

and there seems to be a patch for the error.. ah.. so much time wasted...


All times are GMT -5. The time now is 12:59 PM.