LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices



Reply
 
Search this Thread
Old 04-25-2010, 10:27 AM   #1
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Rep: Reputation: 1
JFS on large LVM-volume (> 35TB) fails


Hi all,

I have been struggling with this one for the whole weekend and could need some help.

I am running Debian Lenny 2.6.26-2-amd64
and I have the following setup:

3 Raid6-Volumes each 19TB size (md0,md1,md2)

I would like to join the three of them to one large LV (of 57TB) formatted with a jfs fs.

I installed:
jfs_mkfs version 1.1.14, 06-Apr-2009
lvm version
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.13.0


Prerequisite: the md are assembled and have synched:

Code:
 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sda[0] sdo[14] sdm[13] sdn[12] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

md1 : active raid6 sdp[0] sdad[14] sdac[13] sdab[12] sdaa[11] sdz[10] sdy[9] sdx[8] sdw[7] sdv[6] sdu[5] sdt[4] sds[3] sdr[2] sdq[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]

md2 : active raid6 sdae[0] sdas[14] sdar[13] sdaq[12] sdap[11] sdao[10] sdan[9] sdam[8] sdal[7] sdak[6] sdaj[5] sdai[4] sdah[3] sdag[2] sdaf[1]
      19046800448 blocks level 6, 16k chunk, algorithm 2 [15/15] [UUUUUUUUUUUUUUU]
I used jfs_mkfs on each of the three mds and mounted the drives to run some checks and they worked without a problem. I am pretty sure the drives/raids are OK.

The details about one raid:

Code:
mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90
  Creation Time : Fri Apr 23 12:52:16 2010
     Raid Level : raid6
     Array Size : 19046800448 (18164.44 GiB 19503.92 GB)
  Used Dev Size : 1465138496 (1397.26 GiB 1500.30 GB)
   Raid Devices : 15
  Total Devices : 15
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Sun Apr 25 15:57:37 2010
          State : clean
 Active Devices : 15
Working Devices : 15
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 16K

           UUID : 8e0040fe:fb148c89:4d482edb:0dad0e58
         Events : 0.8
I then created the PVs
Code:
pvscan
  PV /dev/md0   VG pod   lvm2 [17.74 TB / 0    free]
  PV /dev/md1   VG pod   lvm2 [17.74 TB / 0    free]
  PV /dev/md2   VG pod   lvm2 [17.74 TB / 13.22 TB free]
(there should be 19TB/drive but this is normal as lvm measures the size differently)

Then I created the VG "pod" with the three drives as members:
Code:
vgdisplay pod
  --- Volume group ---
  VG Name               pod
  System ID
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               53.22 TB
  PE Size               4.00 MB
  Total PE              13950291
  Alloc PE / Size       10485760 / 40.00 TB
  Free  PE / Size       3464531 / 13.22 TB
  VG UUID               3B0kpR-k9Pn-Xxeu-l0Qr-X4TM-Ha2k-LFNumI
then I created a 40TB LV for testing
Code:
 lvcreate -L 40T pod
which gave me this
Code:
lvscan
  ACTIVE            '/dev/pod/lvol0' [40.00 TB] inherit
then I went on formatting the drive using
Code:
jfs_mkfs /dev/pod/lvol0
Still - everything fine.

when trying to check the partition with jfs_fsck
I get
Code:
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.
The really funny thing is that I don't seem to get the error if I try to run the thing with a 25TB LV everything looks fine:

Code:
cryptpod1:/# lvresize -L 25T /dev/po
pod/  port
cryptpod1:/# lvresize -L 25T /dev/po
pod/  port
cryptpod1:/# lvresize -L 25T /dev/pod/lvol0
  WARNING: Reducing active logical volume to 25.00 TB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce lvol0? [y/n]: y
  Reducing logical volume lvol0 to 25.00 TB
  Logical volume lvol0 successfully resized
cryptpod1:/# jfs_mkfs /dev/pod/lvol0
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod/lvol0 will be lost!

Continue? (Y/N) Y
   |

Format completed successfully.

26843545600 kilobytes total disk space.
cryptpod1:/# jfs_fsck /dev/pod/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/25/2010 16.19.29
The current device is:  /dev/pod/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  6710886400
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
26843545600 kilobytes total disk space.
        0 kilobytes in 1 directories.
        0 kilobytes in 0 user files.
        0 kilobytes in extended attributes
  4230484 kilobytes reserved for system use.
26839315116 kilobytes are available for use.
Filesystem is clean.

but not with a 40TB LV:

Code:
 lvresize -L 40T /dev/pod/lvol0
  Extending logical volume lvol0 to 40.00 TB
  Logical volume lvol0 successfully resized
cryptpod1:/# jfs_mkfs /dev/pod/lvol0
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod/lvol0 will be lost!

Continue? (Y/N) Y
   |

Format completed successfully.

42949672960 kilobytes total disk space.
cryptpod1:/# jfs_fsck /dev/pod/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/25/2010 16.21.33
The current device is:  /dev/pod/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  10737418240
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.
There is nothing in syslog that would point to the source of this error. Also - if I try to mount the 40TB partition, everything is fine - but when I try to write to it, the whole system hangs and there is no way to recover.

Thank you very much for your help...

M

Last edited by murmur101; 04-25-2010 at 02:51 PM.
 
Old 04-25-2010, 01:27 PM   #2
smoker
Senior Member
 
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 248Reputation: 248Reputation: 248
Going by my lvm setup, which admittedly is not as huge as yours, there seems to be a discrepancy in the reported metadata sequence number.

I have 4 physical volumes, you have 3 (accounting for raid)

my vgdisplay output is

Code:
[root@kids ~]# vgdisplay my_movies_group
  --- Volume group ---
  VG Name               my_movies_group
  System ID
  Format                lvm2
  Metadata Areas        4
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                4
  Act PV                4
  VG Size               1.55 TB
  PE Size               4.00 MB
  Total PE              405398
  Alloc PE / Size       405398 / 1.55 TB
  Free  PE / Size       0 / 0
  VG UUID               zsNLjr-7HKL-VFip-UWb7-8fLc-zOJF-xZ2zVl
Yours is
Code:
vgdisplay pod
  --- Volume group ---
  VG Name               pod
  System ID
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               53.22 TB
  PE Size               4.00 MB
  Total PE              13950291
  Alloc PE / Size       10485760 / 40.00 TB
  Free  PE / Size       3464531 / 13.22 TB
  VG UUID               3B0kpR-k9Pn-Xxeu-l0Qr-X4TM-Ha2k-LFNumI
This may be a cause of the problems.
 
Old 04-25-2010, 02:39 PM   #3
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Original Poster
Rep: Reputation: 1
Quote:
Originally Posted by smoker View Post
Going by my lvm setup, which admittedly is not as huge as yours, there seems to be a discrepancy in the reported metadata sequence number.

I have 4 physical volumes, you have 3 (accounting for raid)

my vgdisplay output is


This may be a cause of the problems.

Hi,

thank you for your help. It looked promising but I think the Metadata Sequence No is just increased by 1 every time I do an operation with the VG or LV. I added one drive to the VG this time and then extended twice and ended up with

Metadata Sequence No 4

- still the same problem.

I added the lvcfgbackup to this post:

Code:
cryptpod1:/etc/lvm/backup# cat pod
# Generated by LVM2 version 2.02.39 (2008-06-27): Sun Apr 25 20:52:52 2010

contents = "Text Format Volume Group"
version = 1

description = "Created *after* executing 'vgcfgbackup'"

creation_host = "cryptpod1"     # Linux cryptpod1 2.6.26-2-amd64 #1 SMP Tue Mar 9 22:29:32 UTC 2010 x86_64
creation_time = 1272221572      # Sun Apr 25 20:52:52 2010

pod {
        id = "fVbjR8-JvcA-9YtS-gkic-KBmW-31Ue-2wnYxw"
        seqno = 10
        status = ["RESIZEABLE", "READ", "WRITE"]
        extent_size = 8192              # 4 Megabytes
        max_lv = 0
        max_pv = 0

        physical_volumes {

                pv0 {
                        id = "8hWDiA-u7o8-ViLc-7nIs-9SRo-tnMe-OowR1t"
                        device = "/dev/md2"     # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }

                pv1 {
                        id = "PZeRTh-qtxW-u5k1-5CPJ-KSf0-i2fg-jK4oJl"
                        device = "/dev/md1"     # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }

                pv2 {
                        id = "20Cgl7-f6bW-fsgx-wOm1-skzW-4Skj-IRuIoO"
                        device = "/dev/md0"     # Hint only

                        status = ["ALLOCATABLE"]
                        dev_size = 38093600896  # 17.7387 Terabytes
                        pe_start = 384
                        pe_count = 4650097      # 17.7387 Terabytes
                }
        }

        logical_volumes {

                lvol0 {
                        id = "Mpe0zN-emfe-aReP-LQz7-sr9b-Rj2y-KFXUcw"
                        status = ["READ", "WRITE", "VISIBLE"]
                        segment_count = 3

                        segment1 {
                                start_extent = 0
                                extent_count = 4650097  # 17.7387 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv0", 0
                                ]
                        }
                        segment2 {
                                start_extent = 4650097
                                extent_count = 4650097  # 17.7387 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv1", 0
                                ]
                        }
                        segment3 {
                                start_extent = 9300194
                                extent_count = 4648489  # 17.7326 Terabytes

                                type = "striped"
                                stripe_count = 1        # linear

                                stripes = [
                                        "pv2", 0
                                ]
                        }
                }
        }
}
M

Last edited by murmur101; 04-25-2010 at 02:55 PM.
 
Old 04-25-2010, 07:14 PM   #4
smoker
Senior Member
 
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 248Reputation: 248Reputation: 248
I find it hard to follow your method, as you created filesystems before you created the actual logical volume. This shouldn't cause an issue, but I can't track it through what I did. All I ever did was assign disks to an LVM volume, then create a logical drive. After that, I formatted the LVM and all was fine. You appear to have done all that, but the order is not obvious, and you are having errors.

I would suggest starting again and doing it in a proper order, and assigning all the extents to the LVM, before making a filesystem on the LVM. I think the LVM is recording a drive as present, but the rest of the OS doesn't see it as accessable.

I did this from scratch, the OS had no part in the setup of LVM whatsoever.

Oh, BTW, I have added 2 drives to this LVM over time, but the metadata sequence area is still at 4. I started with 2 drives. It appears to increase with the number of drives.

Last edited by smoker; 04-25-2010 at 07:19 PM.
 
Old 04-26-2010, 02:44 AM   #5
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Original Poster
Rep: Reputation: 1
Hi Smoker,

thx for bearing with me.
I think I did it "by the book" but just to be sure I repeated the procedure this morning:

Code:
cryptpod1:~# pvremove /dev/md0
  Labels on physical volume "/dev/md0" successfully wiped
cryptpod1:~# pvremove /dev/md1
  Labels on physical volume "/dev/md1" successfully wiped
cryptpod1:~# pvremove /dev/md2
  Labels on physical volume "/dev/md2" successfully wiped
cryptpod1:~# dd if=/dev/zero of=/dev/mb0
dd: writing to `/dev/mb0': No space left on device
18505+0 records in
18504+0 records out
9474048 bytes (9.5 MB) copied, 0.0186135 s, 509 MB/s
cryptpod1:~# dd if=/dev/zero of=/dev/mb1
dd: writing to `/dev/mb1': No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000137518 s, 0.0 kB/s
cryptpod1:~# dd if=/dev/zero of=/dev/mb2
dd: writing to `/dev/mb2': No space left on device
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000121664 s, 0.0 kB/s
cryptpod1:~# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created
cryptpod1:~# pvcreate /dev/md1
  Physical volume "/dev/md1" successfully created
cryptpod1:~# pvcreate /dev/md2
  Physical volume "/dev/md2" successfully created
cryptpod1:~# vgcreate pod0 /dev/md0 /dev/md1 /dev/md2
  Volume group "pod0" successfully created
cryptpod1:~# vgdisplay pod0
  --- Volume group ---
  VG Name               pod0
  System ID             
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  1
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                0
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               53.22 TB
  PE Size               4.00 MB
  Total PE              13950291
  Alloc PE / Size       0 / 0   
  Free  PE / Size       13950291 / 53.22 TB
  VG UUID               pWE9eu-tQNc-o17V-BkFp-nag5-QTh6-nSIdrf
cryptpod1:~# lvcreate -L 40T pod0
  Logical volume "lvol0" created
cryptpod1:~# lvdisplay 
  --- Logical volume ---
  LV Name                /dev/pod0/lvol0
  VG Name                pod0
  LV UUID                CBvB6d-kTRP-v7OJ-Mvzb-rh5b-dVIN-aMl4cL
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                40.00 TB
  Current LE             10485760
  Segments               3
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
cryptpod1:~# vgdisplay pod0
  --- Volume group ---
  VG Name               pod0
  System ID             
  Format                lvm2
  Metadata Areas        3
  Metadata Sequence No  2
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                3
  Act PV                3
  VG Size               53.22 TB
  PE Size               4.00 MB
  Total PE              13950291
  Alloc PE / Size       10485760 / 40.00 TB
  Free  PE / Size       3464531 / 13.22 TB
  VG UUID               pWE9eu-tQNc-o17V-BkFp-nag5-QTh6-nSIdrf
cryptpod1:~# jfs_mkfs /dev/pod0/lvol0 
jfs_mkfs version 1.1.14, 06-Apr-2009
Warning!  All data on device /dev/pod0/lvol0 will be lost!

Continue? (Y/N) Y
   |

Format completed successfully.

42949672960 kilobytes total disk space.
cryptpod1:~# jfs_fsck /dev/pod0/lvol0 -f
jfs_fsck version 1.1.14, 06-Apr-2009
processing started: 4/26/2010 8.33.56
The current device is:  /dev/pod0/lvol0
Block size in bytes:  4096
Filesystem size in blocks:  10737418240
**Phase 0 - Replay Journal Log
**Phase 1 - Check Blocks, Files/Directories, and  Directory Entries
**Phase 2 - Count links
**Phase 3 - Duplicate Block Rescan and Directory Connectedness
**Phase 4 - Report Problems
**Phase 5 - Check Connectivity
**Phase 6 - Perform Approved Corrections
**Phase 7 - Rebuild File/Directory Allocation Maps
**Phase 8 - Rebuild Disk Allocation Maps
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod0/lvol0.  CANNOT CONTINUE.
I was also able to identify the point where I get the error and it really seems to be when the volume or the jfs partition stretches on the third disk. I also tried to change the order of the disks (added /dev/md2 first and extended then) without results.
As you can see the count was increased after I created the LV. I will test this on another station to see if this is normal or related to the server.

Thx
M
 
Old 04-26-2010, 04:17 AM   #6
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Original Poster
Rep: Reputation: 1
Update:

seems to be jfs-related somehow. The same LV formated with XFS came out OK (I can't run a check of the entire partition in XFS as it is an incredible memory-hog).

I will focus my search on jfs-related problems now and if I don't find anything I will fall back to XFS.

M
 
Old 04-26-2010, 06:17 AM   #7
smoker
Senior Member
 
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 248Reputation: 248Reputation: 248
Quote:
Originally Posted by murmur101 View Post
Update:

seems to be jfs-related somehow. The same LV formated with XFS came out OK (I can't run a check of the entire partition in XFS as it is an incredible memory-hog).

I will focus my search on jfs-related problems now and if I don't find anything I will fall back to XFS.

M
Interesting, there should be no problems with JFS. I can't say much about it as a filesystem, except that it allocates files as extents.
Whether that conflicts with LVM style extents or not might be an issue. However XFS does the same thing in a slightly different way. And that appears to be ok.
 
Old 04-27-2010, 02:12 PM   #8
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Original Poster
Rep: Reputation: 1
Okay..

I do have another server with 100% the same hardware (yes I have more than 110TB in 2 servers and no - I don't plan to mirror all the porn of the internet)

Did a full run from scratch with only the versions from debians stable repos and.. lo and behold:

Code:
ujfs_rw_diskblocks: disk_count is 0
Unrecoverable error writing M to /dev/pod/lvol0.  CANNOT CONTINUE.
same thing. And with this box - no fooling around. It was a start-to-end config on a vanilla system. I will try to file a bug about this one as I think I have enough evidence (reproducible, works with other fs) that something is fishy about jfs.

Will come back to this thread if I find something.
Cheers and thank you
 
Old 04-27-2010, 03:00 PM   #9
H_TeXMeX_H
Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269Reputation: 1269
Try a live CD with another distro, just to make sure it isn't Debian-specific (or the version of JFS utils it uses). If it is a bug, tell them about it.
 
Old 04-27-2010, 03:01 PM   #10
murmur101
LQ Newbie
 
Registered: Apr 2010
Posts: 15

Original Poster
Rep: Reputation: 1
Seems to be a known issue with large volumes:
https://sourceforge.net/tracker/?fun...02&atid=712756

and there seems to be a patch for the error.. ah.. so much time wasted...

Last edited by murmur101; 04-27-2010 at 03:04 PM.
 
  


Reply

Tags
amd64, debian, jfs, lvm2


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Slackware64: cp from JFS volume to NFS share = jfsCommit: page allocation failure. granth Slackware 10 10-01-2009 06:49 PM
JFS Filesystem, boot fs check fails. Romanus81 Slackware 3 10-14-2008 06:27 PM
Fedora LVM volume group & Physical Volume resize problem gabeyg Fedora 1 05-14-2008 12:26 PM
Problem mounting a logical volume with JFS sumta Linux - General 3 10-05-2006 12:49 PM
Mounting a JFS LVM vbtalent Fedora 4 08-29-2005 12:28 PM


All times are GMT -5. The time now is 06:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration