LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   LVM problems (https://www.linuxquestions.org/questions/linux-software-2/lvm-problems-484047/)

ozric 09-16-2006 08:33 AM

LVM problems
 
Edit: ** This might not be the right forum/subforum for this post, many appologies. If mods knows where it should belong, please move it to the approporiate place **

I sincerely doubt that anyone has enough time to even read thru this post, but I am going to take my chances as I have nowhere else to turn and just hope for the best.

My problem:

I have an LVM spanning over 6 disks consisting of two directory structures (dir1, dir2). When issuing an 'ls' or 'dir' in dir1, I get a filelist w/o problems. Doing the same in dir2, the system halts after reading from the disks, without any filelist being shown at all. The cursor freezes, and all I can do is cut the power to the machine and restart it. dir2 is the larger of the two structures.

When copying data from dir1, the system sometimes halts. Repeating the same procedure (copying the exact same files) sometimes works. The volumeset is formatted with XFS, and I have tried running xfs_check. This also leads to an immediate halt (not like when running dir, then the disks reads for a while (~8 seconds) before system freezes).

Steps I have taken in order to find the problem:

Checked RAM for errors, and replaced them
Updated Kernel to latest version
Changed the two controllercards used for the LVM
Installed a fresh system, and mounted the LVM

None of these steps has helped.

So, is it a faulty disk?
I would think so. But then, how come I can access data from dir1, and then the next time I try with the same files, the system halts? It makes no sense to me.
The system has worked flawless for over two years, and now, without making any actual alterations - it doesn't.

I am all out of ideas, and I'm turning to you guys here. If anyone has any ideas.. Please..

System Specs:

Intel P4 3GHz, 2x512 MB RAM
SuSE Linux 9.2
2 x Promise PATA 133 TX2 controllers
I will be delighted to fill in with more details of the system if required.

Samoth 09-16-2006 09:30 AM

what does Smartmon say about your disks?

ozric 09-16-2006 09:57 AM

Quote:

Originally Posted by Samoth
what does Smartmon say about your disks?

Thanks for your reply Samoth.
When issuing smartctl -H /dev/hd[x] it returns

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

on all disks.

haertig 09-16-2006 10:27 AM

Since the system is halting on you, you probably can't get a good look at syslog.

Before doing the steps to trigger the problem, can you come in remotely via ssh from some other computer and then run a tail -f on /var/log/syslog? Recreate your halting problem on the local computer, and maybe you'll see something of interest on the remote computer's tail command.

ozric 09-16-2006 11:10 AM

Quote:

Originally Posted by haertig
Since the system is halting on you, you probably can't get a good look at syslog.

Before doing the steps to trigger the problem, can you come in remotely via ssh from some other computer and then run a tail -f on /var/log/syslog? Recreate your halting problem on the local computer, and maybe you'll see something of interest on the remote computer's tail command.

Thanks for trying to help Haertig.

I can access the machine via SSH, but I am not sure how to tail the syslog file. I'm running SuSE 9.2, and there is no such file as syslog. Should I configure syslog.conf to make syslogd to output everything to a textfile? Again, thx for helping.

haertig 09-16-2006 11:35 AM

Quote:

Originally Posted by ozric
...and there is no such file as syslog...

I'm not familiar with SuSE, but it's GOT to have a syslog file. That's quite standard. I can't imagine not having one configured by default. Maybe it's configured to be in a different place than mine, which is in /var/log/syslog.

Check out your /etc/syslog.conf Here are the lines from my syslog.conf that show where standard logfiles are saved. Look for where SuSE stores the files on your system.

Edited from my (Debian) /etc/syslog.conf:
Code:

...snip...

auth,authpriv.*                /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
#cron.*                        /var/log/cron.log
daemon.*                        -/var/log/daemon.log
kern.*                          -/var/log/kern.log
lpr.*                          -/var/log/lpr.log
mail.*                          -/var/log/mail.log
user.*                          -/var/log/user.log
uucp.*                          /var/log/uucp.log

...snip...

p.s. - If you make changes to your syslog.conf file, I believe you'll have to restart syslogd for them to take effect. I doubt you'll have to really change anything. Just snoop around syslog.conf to find out where they're currently being stored. I just can't fathom SuSE not having a syslog defined by default.

ozric 09-16-2006 01:01 PM

Ok, had to reconfigure syslog.conf in order to get the daemon to output it to a textfile.

But no, the syslog doesn't reveal anything about what is happening or why the system halts.

haertig 09-16-2006 01:23 PM

Quote:

Originally Posted by ozric
But no, the syslog doesn't reveal anything about what is happening or why the system halts.

Bummer. I was hoping you'd find some info there.

I guess we need to be specific about what you mean my "halt". Are you getting a message in your terminal window something like "Kernel panic, system halted"? Now THAT would be a halt. Or is it just that your local terminal window is freezing up (but you can still use the system via that ssh connection you setup from the other computer). Or, are you getting NO messages anywhere, and everything - including that remote ssh connection - is just frozen.

A frozen application vs. a frozen Xwindows vs. a halted system are all different things. I can't determine exactly which situation you are dealing with without more info. My hunch is that you're dealing with a hardware problem or some corruption in your filesystem or LVM. You've already taken some good troubleshooting steps.

I'm a bit concerned about your earlier statement:
Quote:

Checked RAM for errors, and replaced them
Does this mean you actually FOUND memory problems and replaced that bad ram? If so, the bad ram might have been the initial source of the problem, but it cascaded into filesystem or LVM corruption. i.e., you've now fixed the initial issue, but you still have to deal with the various corruption it might have caused.

Samoth 09-16-2006 02:07 PM

In most of my system hang problems, I was able to ssh in and look around, but yours may be different.

A question: Are those LVM disks mounted as your "root" partition? If so, then you should be able to let it sit around a while after it freezes and then reboot. You should have something getting logged and if you have journaling, it will still show up next boot.

ozric 09-16-2006 02:24 PM

Ok.

To be more precise, the system FREEZES, all activity stops. If I run ls from a prompt the cursor ceases to blink, can't trigger ctrl-z, nothing - except for turning the power off. If I go into dir2 (se first post) thru Xwindows, the mousecursor freezes and all activity stops.

Or, are you getting NO messages anywhere, and everything - including that remote ssh connection - is just frozen.

^- Here is exacltly where I am at.

Regarding the RAM, I changed them just to see if new ones would make any difference, but it didn't - same error - so I replaced them with the original ones. Please bear in mind, everything worked fine upto one point when things started to go wrong.

ozric 09-16-2006 02:37 PM

Quote:

Originally Posted by Samoth
Are those LVM disks mounted as your "root" partition? If so, then you should be able to let it sit around a while after it freezes and then reboot. You should have something getting logged and if you have journaling, it will still show up next boot.

I knew I would reach the point where I couldn't answer the question. If you by root partition mean that the LVM is including the system drive, then the answer is no.

Samoth 09-16-2006 07:26 PM

that is what I wanted to know. I find it quite odd that a single LVM fs can blow up the entire system.....

thinking.....

PTrenholme 09-16-2006 09:20 PM

Being a Fedora user, I not sure if this applies to SuSE. But, for what it's worth, Fedora uses LVM2 and implements the device mapper, so the logical volumes are attached as children of /dev/mapper. If you're going to run fsck on a logical volume, you must do it on the appropriate /dev/mapper entry, not on /dev/hdx. For one thing, the logical volume partition type on /dev/hdx is not a valid Linux file system partition type.

Samoth 09-17-2006 07:41 AM

that does apply, but I dont really see how it will help him now, unless you mean that one of his logical volume has a messed up fs on it.

Maybe you could try running the equivelent of fsck(for your fs) on your logical volume. Maybe the fs is messed up.

ozric 09-17-2006 10:25 AM

I have tried using xfs_check and xfs_repair which are the tools used for xfs. This freezes the machine immediately.
xfs_repair freezes at "Phaze 1: Find and verify Superblock..."

Today I changed motherboard, processor and memory just to be on the safe side. Same problem.


All times are GMT -5. The time now is 02:08 PM.