[SOLVED] Configuring / Usage of RAID 0 & RAID 5 with PV / VG / LVM
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
We have been noticing the server booting up fine and working all well for 24-36 hours under heavy loads of 20 due to test "dd" write/read operations before hand-off for usage to others. However, all of a sudden we notice EXT3 FS errors and the server is inaccessible like FS erros attached.
We tried recreting Journals and running FS checks. All works well post reboot but fails post 24-36 hours.
My question to the experts is - Will having two physical volumes in the same volume group, cause a problem? Especially a File System issue? One is a raid 1 and the other a raid5. Do we NEED to seperate them like below? Is this a mandate??? Please let me know.
We've got a similar OS/HW combo, but we use separate VG's for each array
vg00 = OS+App Code
vg01 = App Data
makes doing a hot swap of the OS+App disk for backup/creating a new machine easy.
Unless you are planning to put both 'disks' in the same LV, I'd keep them totally separate.
Always consider not what complication you can build, but how hard it would be to fix when(!) it breaks... especially on a prod server
1. would be interesting to see the preceding bit of the error log
2. exactly what version of RHEL5? (see /etc/redhat-release) ; we had an issue with 5.0
& I wrote it up. here's the opening stanza
1. would be interesting to see the preceding bit of the error log
ANS: Unfortunately, I don't see any errors in the dmesg; /var/log/messages; /dev/console even through I have configured syslog to push the all kernel and system logs to /dev/console and syslog server and messages. Post the occurrence of this error snippet, I believe the system simply freezes and any write operation to disk fails. This is my assumption seeing the behavior! Do you also think this is a possibilty (or) is there a way I'm not aware of using which I can get the preceding bit of error log? Please advise.
2. exactly what version of RHEL5? (see /etc/redhat-release) ; we had an issue with 5.0
& I wrote it up. here's the opening stanza
3. ultimately I'd ask RH or HP. I'd assume you have contracts with them?
ANS: I too wanted to contact support in order to really understand and know for sure if clubbing two different RAID PV's under the same VolGroup00 is ALLOWED OR NOT BY DESIGN and allocate the LVM space on the whole VG for the base FS structure holding both App code and data. Unfortuately, due to time constraints to meet deadlines, I choose to rebuild the server with deperate PVs for App code on 1st Volgroup00 and Data on 2nd VolGroup01.
I'm in the middle of rebuild and hoped to reach out to this forum to know for sure if clubbing RAID 1 and RAID 5 PVs and allocating it for creating a single VG "WILL"/"COULD" (do we know for sure technically by PV/VG/LV design} result in File System errors or NOT?
Will having two physical volumes in the same volume group, cause a problem for sure? Looks like this is is not a great implementation but do we know by experience/technically by design if this is allowed or not? I'm just keen here to know. I know it is allowed as the partitioning sheme layout when system build, allows to select the SCSI devices to be used for 2 PVs to be clubbed int 1 VG how I ended up here in first place.
Anyways, as I stated earlier, I'm implementing the following now;
3. As above, I prefer to separate them as described, for the reasons said
4. that error jpg starts with what looks like part of a RAM content dump or similar, but I'd expect a few msgs before that.
5. have you installed the appropriate Product Support Pack from HP eg psp-8.73.rhel5.i686.en.tar.gz ?
You can get the latest from the HP website (free) and it contains a bunch of utilitly rpms built to allow you to interrogate HP HW eg the array.
I use eg hpacucli to monitor hot disk swaps/rebuilds.
HP also has forums you may want to ask this on, although I'd go with phone call (as well) for speed.
+ Great! This is what I wanted to hear for my clarification and understanding
3. As above, I prefer to separate them as described, for the reasons said
+ Agreed and make sense! I have already implemented this.
4. that error jpg starts with what looks like part of a RAM content dump or similar, but I'd expect a few msgs before that.
+ Yes. Me too! I have enabled the logs to display on the console and push to another syslog server as well if it fails to write locally.
5. have you installed the appropriate Product Support Pack from HP eg psp-8.73.rhel5.i686.en.tar.gz ?
You can get the latest from the HP website (free) and it contains a bunch of utilitly rpms built to allow you to interrogate HP HW eg the array.
I use eg hpacucli to monitor hot disk swaps/rebuilds.
HP also has forums you may want to ask this on, although I'd go with phone call (as well) for speed.
PS: Even post separating the RAID 0 and RAID 5 Volumes, not to be in use by a single VolGroup00; looks like we are still running into an unresponsive server state unable to get a session post 12 hours of humming fine with the error attached, slowness in read/write operations. Possibly, the next action item will be to log a ticket with RHEL / HP is my client agrees. This is a old HP Proliant 385 Old Hardware. Will update the progress later on as it happens...
Moved: This thread is more suitable in Linux - Server and has been moved accordingly to help your thread/question get the exposure it deserves (as per your kind request).
Well, those cmds are to help you do diags etc; you have to run them or set them up in some cases.
You need to read up on each one.
In the meantime, consider setting up http://linux.die.net/man/1/collectl, which should generate comprehensive logs.
There's collectl guru who hangs out on LQ; normally he turns up soon after someone mentions it, so fingers crossed.
Alternately, search LQ for collectl posts.
If this is old HW, maybe the disks are dying; consider also http://linux.die.net/man/8/smartctl and/or any HP specific tools for disk health checks eg above/google/ask HP
Well, those cmds are to help you do diags etc; you have to run them or set them up in some cases.
You need to read up on each one.
--> Yes. These packages are already setup and monitoring the hardware layer.
In the meantime, consider setting up http://linux.die.net/man/1/collectl, which should generate comprehensive logs.
There's collectl guru who hangs out on LQ; normally he turns up soon after someone mentions it, so fingers crossed.
Alternately, search LQ for collectl posts.
--> Good to know about collectl. Will try this!
If this is old HW, maybe the disks are dying; consider also http://linux.die.net/man/8/smartctl and/or any HP specific tools for disk health checks eg above/google/ask HP
--> I agree. I will be logging a RedHat support ticket to see what they have to say just to rule out this anything driver related with the kernel which is a known issue given it is a old hardware. Then, once we identify it is hardware related, we will try upgrading the bios and h/w firmware.
Sorry it took so long for me to show up but I'm the collectl guru mentioned earlier . Collectl is not a magic bullet and doesn't do anything for reporting disk errors, but sometimes you can draw pretty good inferences by looking at its numbers for abnormalities. For example, if you know exactly what time you're seeing problems, or at least some sort of approximation, you can play back recorded data around that time period and look at the disk detail data using -sD. Maybe you'll see some high wait or service times, which would immediately indicate a slow down. Of course this all assumes you've had collectl running.
Other things you could do is look at at other system activity around the same time. Sometimes it could be related to a specific process and playing back collectl data with --top will let you look at the top processes every minute sorted by the column of your choice, see --showtopopts for details. Might one of them being doing excessize I/O? Dunno, but you can sort processes by top disk I/O usage.
The other thing is if you install collectl-utils it will give you access to colplot which will let you plot everything and visually see if you can spot anything odd.
As I said there's really no easy recipe for doing any of this. It can take a lot of digging around and knowing how to spot something unusual.
Thanks Mark and Chris for your inputs. Especially Chris has been very helpful in clarifying my questions. Finally, we were able to resolve the EXT3 Journal Errors and Read/Write Latency Wait Time messages and instability issues with the HP Proliant 385 Hardware post upgrading the firmware to latest version. Thanks Again!
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.