LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 08-07-2012, 10:58 AM   #1
John King
LQ Newbie
 
Registered: Aug 2012
Posts: 9

Rep: Reputation: Disabled
Question Configuring / Usage of RAID 0 & RAID 5 with PV / VG / LVM


Hello Everyone,

We are building a RHEL5 Server as per requirement and it has the following disk configuration;

#cat /proc/driver/cciss/cciss0
cciss0: HP Smart Array 6i Controller
Board ID: 0x40910e11
Firmware Version: 2.36
IRQ: 177
Logical drives: 2
Sector size: 8192
Current Q depth: 0
Current # commands on controller: 0
Max Q depth since init: 12
Max # commands on controller since init: 12
Max SG entries since init: 127
Sequential access devices: 0

cciss/c0d0: 299.99GB RAID 1(1+0)
cciss/c0d1: 899.98GB RAID 5

In the partitioning scheme during build we clubbed both these PVs for the VolGroup00's usage like below;

#pvs

PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p2 VolGroup00 lvm2 a-- 279.19G 0
/dev/cciss/c0d1p1 VolGroup00 lvm2 a-- 838.16G 0

We have been noticing the server booting up fine and working all well for 24-36 hours under heavy loads of 20 due to test "dd" write/read operations before hand-off for usage to others. However, all of a sudden we notice EXT3 FS errors and the server is inaccessible like FS erros attached.



We tried recreting Journals and running FS checks. All works well post reboot but fails post 24-36 hours.

My question to the experts is - Will having two physical volumes in the same volume group, cause a problem? Especially a File System issue? One is a raid 1 and the other a raid5. Do we NEED to seperate them like below? Is this a mandate??? Please let me know.

PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p2 VolGroup00 lvm2 a- 279.19G 0
/dev/cciss/c0d1p1 VolGroup01 lvm2 a- 838.16G 0

Regards
John
Attached Thumbnails
Click image for larger version

Name:	FS EXT3 Error.jpg
Views:	16
Size:	101.6 KB
ID:	10301  

Last edited by John King; 08-07-2012 at 11:07 AM.
 
Old 08-07-2012, 06:53 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,237

Rep: Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712
We've got a similar OS/HW combo, but we use separate VG's for each array
vg00 = OS+App Code
vg01 = App Data

makes doing a hot swap of the OS+App disk for backup/creating a new machine easy.

Unless you are planning to put both 'disks' in the same LV, I'd keep them totally separate.
Always consider not what complication you can build, but how hard it would be to fix when(!) it breaks... especially on a prod server

1. would be interesting to see the preceding bit of the error log

2. exactly what version of RHEL5? (see /etc/redhat-release) ; we had an issue with 5.0
& I wrote it up. here's the opening stanza
Quote:
Due to a filesystem failure issue that occurred on <some date>, which looks like this bug https://bugzilla.redhat.com/show_bug.cgi?id=494927, which is fixed (allegedly) in kernel upgrade associated with RHEL 5.7 http://rhn.redhat.com/errata/RHSA-2011-1065.html .
See ‘bugs fixed’ section for id 494927, we are planning to upgrade ...
3. ultimately I'd ask RH or HP. I'd assume you have contracts with them?

PS: you might want to ask the Mods (via the Report button) to move this to the Server forum?
Doesn't really sound like a Newbie-to-Linux qn to me.

Last edited by chrism01; 08-07-2012 at 07:08 PM.
 
Old 08-07-2012, 07:53 PM   #3
John King
LQ Newbie
 
Registered: Aug 2012
Posts: 9

Original Poster
Rep: Reputation: Disabled
PV / VG / LVM Issue

Thanks for your logical inputs Chrism01.

Here is the info;

1. would be interesting to see the preceding bit of the error log

ANS: Unfortunately, I don't see any errors in the dmesg; /var/log/messages; /dev/console even through I have configured syslog to push the all kernel and system logs to /dev/console and syslog server and messages. Post the occurrence of this error snippet, I believe the system simply freezes and any write operation to disk fails. This is my assumption seeing the behavior! Do you also think this is a possibilty (or) is there a way I'm not aware of using which I can get the preceding bit of error log? Please advise.

2. exactly what version of RHEL5? (see /etc/redhat-release) ; we had an issue with 5.0
& I wrote it up. here's the opening stanza

ANS: RHEL 5.7 x86_64 running 2.6.18-308.11.1.el5 #1 SMP kernel

3. ultimately I'd ask RH or HP. I'd assume you have contracts with them?

ANS: I too wanted to contact support in order to really understand and know for sure if clubbing two different RAID PV's under the same VolGroup00 is ALLOWED OR NOT BY DESIGN and allocate the LVM space on the whole VG for the base FS structure holding both App code and data. Unfortuately, due to time constraints to meet deadlines, I choose to rebuild the server with deperate PVs for App code on 1st Volgroup00 and Data on 2nd VolGroup01.

I'm in the middle of rebuild and hoped to reach out to this forum to know for sure if clubbing RAID 1 and RAID 5 PVs and allocating it for creating a single VG "WILL"/"COULD" (do we know for sure technically by PV/VG/LV design} result in File System errors or NOT?

Will having two physical volumes in the same volume group, cause a problem for sure? Looks like this is is not a great implementation but do we know by experience/technically by design if this is allowed or not? I'm just keen here to know. I know it is allowed as the partitioning sheme layout when system build, allows to select the SCSI devices to be used for 2 PVs to be clubbed int 1 VG how I ended up here in first place.

Anyways, as I stated earlier, I'm implementing the following now;

PV VG Fmt Attr PSize PFree
/dev/cciss/c0d0p2 VolGroup00 lvm2 a- 279.19G 0 <=> vg00 = OS+App Code
/dev/cciss/c0d1p1 VolGroup01 lvm2 a- 838.16G 0 <=> vg01 = App Data

Finally, I can't seem to find the Report button. Please point me there as well.


Thanks again!


Regards
John
 
Old 08-08-2012, 12:43 AM   #4
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,237

Rep: Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712
1. Report button is at bottom right of each post

2. in theory you can group multiple disks into one VG, that's the whole pt of the LVM system/concept
http://tldp.org/HOWTO/LVM-HOWTO/
http://sunoano.name/ws/public_xhtml/lvm.html

3. As above, I prefer to separate them as described, for the reasons said

4. that error jpg starts with what looks like part of a RAM content dump or similar, but I'd expect a few msgs before that.

5. have you installed the appropriate Product Support Pack from HP eg psp-8.73.rhel5.i686.en.tar.gz ?
You can get the latest from the HP website (free) and it contains a bunch of utilitly rpms built to allow you to interrogate HP HW eg the array.
I use eg hpacucli to monitor hot disk swaps/rebuilds.
HP also has forums you may want to ask this on, although I'd go with phone call (as well) for speed.

http://h18013.www1.hp.com/products/s...id=servers/psp

Last edited by chrism01; 08-08-2012 at 12:45 AM.
 
1 members found this post helpful.
Old 08-08-2012, 10:50 AM   #5
John King
LQ Newbie
 
Registered: Aug 2012
Posts: 9

Original Poster
Rep: Reputation: Disabled
Thanks again for your detailed reply and time, Chris01.


Quote:
Originally Posted by chrism01 View Post
1. Report button is at bottom right of each post

+ Done. Thanks!

2. in theory you can group multiple disks into one VG, that's the whole pt of the LVM system/concept
http://tldp.org/HOWTO/LVM-HOWTO/
http://sunoano.name/ws/public_xhtml/lvm.html

+ Great! This is what I wanted to hear for my clarification and understanding

3. As above, I prefer to separate them as described, for the reasons said

+ Agreed and make sense! I have already implemented this.

4. that error jpg starts with what looks like part of a RAM content dump or similar, but I'd expect a few msgs before that.

+ Yes. Me too! I have enabled the logs to display on the console and push to another syslog server as well if it fails to write locally.

5. have you installed the appropriate Product Support Pack from HP eg psp-8.73.rhel5.i686.en.tar.gz ?
You can get the latest from the HP website (free) and it contains a bunch of utilitly rpms built to allow you to interrogate HP HW eg the array.
I use eg hpacucli to monitor hot disk swaps/rebuilds.
HP also has forums you may want to ask this on, although I'd go with phone call (as well) for speed.

http://h18013.www1.hp.com/products/s...id=servers/psp
+ Yes. It's already there! But nothing in the logs when this error occurs.

HP DRIVERS:
cpq_cciss-2.6.18-16.x86_64
cpqacuxe-8.10-1.i386
hp-health-8.1.0-104.rhel4.x86_64
hp-ilo-8.1.0-104.rhel4.x86_64
hp-smh-templates-8.1.0-104.noarch
hp-snmp-agents-8.1.0-110.rhel4.x86_64
hpacucli-8.10-2.i386
hpadu-8.10-3.i386
hpdiags-8.1.0-136.i586
hponcfg-1.8.0-1.noarch
hpsmh-2.1.12-200.x86_64
hpvca-2.1.9-7.i386

PS: Even post separating the RAID 0 and RAID 5 Volumes, not to be in use by a single VolGroup00; looks like we are still running into an unresponsive server state unable to get a session post 12 hours of humming fine with the error attached, slowness in read/write operations. Possibly, the next action item will be to log a ticket with RHEL / HP is my client agrees. This is a old HP Proliant 385 Old Hardware. Will update the progress later on as it happens...

Thank you.


Regards
J
Attached Thumbnails
Click image for larger version

Name:	Console Error Snippet.png
Views:	13
Size:	16.1 KB
ID:	10309  
 
Old 08-08-2012, 04:17 PM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Moved: This thread is more suitable in Linux - Server and has been moved accordingly to help your thread/question get the exposure it deserves (as per your kind request).
 
Old 08-08-2012, 08:41 PM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 18,237

Rep: Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712Reputation: 2712
Well, those cmds are to help you do diags etc; you have to run them or set them up in some cases.
You need to read up on each one.
In the meantime, consider setting up http://linux.die.net/man/1/collectl, which should generate comprehensive logs.
There's collectl guru who hangs out on LQ; normally he turns up soon after someone mentions it, so fingers crossed.
Alternately, search LQ for collectl posts.

If this is old HW, maybe the disks are dying; consider also http://linux.die.net/man/8/smartctl and/or any HP specific tools for disk health checks eg above/google/ask HP
 
Old 08-09-2012, 10:29 AM   #8
John King
LQ Newbie
 
Registered: Aug 2012
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by chrism01 View Post
Well, those cmds are to help you do diags etc; you have to run them or set them up in some cases.
You need to read up on each one.

--> Yes. These packages are already setup and monitoring the hardware layer.

In the meantime, consider setting up http://linux.die.net/man/1/collectl, which should generate comprehensive logs.
There's collectl guru who hangs out on LQ; normally he turns up soon after someone mentions it, so fingers crossed.
Alternately, search LQ for collectl posts.

--> Good to know about collectl. Will try this!

If this is old HW, maybe the disks are dying; consider also http://linux.die.net/man/8/smartctl and/or any HP specific tools for disk health checks eg above/google/ask HP
--> I agree. I will be logging a RedHat support ticket to see what they have to say just to rule out this anything driver related with the kernel which is a known issue given it is a old hardware. Then, once we identify it is hardware related, we will try upgrading the bios and h/w firmware.

Will let you know how it goes!

Thanks Chrism01.
 
Old 08-11-2012, 10:48 AM   #9
markseger
Member
 
Registered: Jul 2003
Posts: 244

Rep: Reputation: 26
Sorry it took so long for me to show up but I'm the collectl guru mentioned earlier . Collectl is not a magic bullet and doesn't do anything for reporting disk errors, but sometimes you can draw pretty good inferences by looking at its numbers for abnormalities. For example, if you know exactly what time you're seeing problems, or at least some sort of approximation, you can play back recorded data around that time period and look at the disk detail data using -sD. Maybe you'll see some high wait or service times, which would immediately indicate a slow down. Of course this all assumes you've had collectl running.

Other things you could do is look at at other system activity around the same time. Sometimes it could be related to a specific process and playing back collectl data with --top will let you look at the top processes every minute sorted by the column of your choice, see --showtopopts for details. Might one of them being doing excessize I/O? Dunno, but you can sort processes by top disk I/O usage.

The other thing is if you install collectl-utils it will give you access to colplot which will let you plot everything and visually see if you can spot anything odd.

As I said there's really no easy recipe for doing any of this. It can take a lot of digging around and knowing how to spot something unusual.


hope this helps...

-mark
 
Old 08-17-2012, 10:00 PM   #10
John King
LQ Newbie
 
Registered: Aug 2012
Posts: 9

Original Poster
Rep: Reputation: Disabled
Thanks Mark and Chris for your inputs. Especially Chris has been very helpful in clarifying my questions. Finally, we were able to resolve the EXT3 Journal Errors and Read/Write Latency Wait Time messages and instability issues with the HP Proliant 385 Hardware post upgrading the firmware to latest version. Thanks Again!

Regards
J
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
btrfs OR raid&lvm nass Slackware 3 06-21-2011 04:20 PM
Dual drive failure in RAID 5 (also, RAID 1, and LVM) ABL Linux - Server 6 05-27-2009 08:01 PM
software raid 5 + LVM and other raid questions slackman Slackware 5 05-09-2007 02:58 PM
Partition type LVM & Raid siawash Linux - Newbie 16 07-13-2004 06:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 06:54 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration