LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-20-2010, 04:08 PM   #1
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Rep: Reputation: 48
e2fsck and unmounted a volume


I have two volumes, both with 800GB total used on them. lets call them /vol1 and /vol2. /vol2 is just a cron'd rsync'd copy of a folder on /vol1 which is a live share for many users.

If I temporarily suspend the cronjob doing the rsync's from /vol1 to /vol2, is it safe to unmount and e2fsck /vol2, then remount it somehow?

Both /vol1 and /vol2 say the filesystem state is not clean when i do a tune2fs -l on them both. According to tune2fs both will check themselves upon restart, but if I can do /vol2 since it isn't the live data beforehand, that will cut my downtime in half the next time i restart the server.

But I also wonder that if I can do this, then i remount /vol2, will the "not clean"-ness of /vol1 just be rsync'd back over to /vol2 the next time the rsync runs?
 
Old 07-20-2010, 04:39 PM   #2
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
I do not know rsync, but that "not clean" part sounds like it needs attention.

Did your system ever go down on a powerfail ?

Does this system ever get rebooted, and does it check clean at that time?

Does issuing a sync command change the message from tune2fs ?

Unless this is part of a raid system, then the handling of a copy from the one drive to another should not copy errors, unless it cannot read the source then it will not do the copy of that file. They would have to be doing a disk sector copy to do that. If is only copies one folder, then it hardly could be a sector copy.

I suspect that rsync just traverses the directories looking at the timestamps and copying any file thats newer than the /vol2 copy.

If that is so, then you can safely stop the cron job, do your e2fsck
and restart the cron job (type it in by hand if you have to).
The rsync will just have to catch up when it notices the timestamps.
Check the rsync docs for what it does the first time it is started
on such a directory, and that should give you an answer.
 
Old 07-20-2010, 04:43 PM   #3
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Original Poster
Rep: Reputation: 48
it didn't go down because of a power failure, i think it just gets like this every once in a while due to the gigs and gigs of data moved on AND off it every month.
 
Old 07-22-2010, 10:16 AM   #4
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Original Poster
Rep: Reputation: 48
Anyone else have any ideas if this is an ok plan of attack to cut my downtime in half?
 
Old 07-22-2010, 02:56 PM   #5
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
I did a test on my system (which gets shut down at night).
Immediately after boot, doing "tune2fs /dev/sda7 -l" gives me a
state that is "clean".
I start one user and copy one file, then the tune2fs gives me a
state of "not clean"
I ran sync, and it did not change.

Apparently they mean by "not clean" is that something was written since the last e2fsck, and sync is not good enough.
Thus your "not clean" is perfectly normal and it only means that at least one file was written since the last e2fsck.
Enough digging in the ext2 docs might have revealed this, but they are not great about putting such info where you can find it when needed.

Last edited by selfprogrammed; 07-22-2010 at 02:58 PM.
 
Old 07-22-2010, 03:05 PM   #6
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Original Poster
Rep: Reputation: 48
Guess that's why i'm confused. so how do you distinguish between if a drive has errors on it or just has something written to it.
 
Old 07-22-2010, 03:18 PM   #7
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
I did google of Rsync and found wikipedia.com/wiki/Rsync, which has a tutorial. There are two ways of using rsync, periodic cron job, and as a deamon.

From the wiki description, it is scanning one directory and transmitting delta information to another rsync. This would NOT copy ext2 errors, but would copy file errors.

It looks to be interruptable. To restart, just do one rsync operation over the directory and let the cron job go periodic again.
From the wiki description it does intend to work on existing directories and thus must be able to catch up if restarted.
It is best to shut it down cleanly, however that is done, instead of
killing the jobs and leaving half written files on /vol2. However, as they are just files, the next rsync will find that they do not match (by checksum) and then will send a diff file to correct it. See Wiki.

As I do not use it myself, this is the best I can do from general principles. It may be that no one else has documented this, it is hard to find anyone using a pkg in an odd way.

Last edited by selfprogrammed; 07-22-2010 at 03:27 PM.
 
Old 07-22-2010, 03:25 PM   #8
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
To catch drive errors, make sure the S.M.A.R.T capability is enabled in your kernel (check dmesg boot info). That is specific disk drive error detection and correction. Most any new SATA drive ought to be SMART capable.


>> grep -C4 "SMART" /var/log/dmesg
>> grep -C4 "sda" /var/log/dmesg

or
>> less /var/log/dmesg
and eyeball it, the spelling is sometimes different.


To catch filesystem errors check the e2fsck report at boot. That is specific filesystem error detection. Also found by checking dmesg.

>> grep -C4 "e2fsck" /var/log/dmesg

You would have to be getting powerfail, crashing kernel, or a buggy motherboard to get disk errors or filesystem errors these days using Linux. If e2fsck is not finding any at boot, and you are not noticing anything else weird, then trust it. Its the way of presenting information ("not clean") to the public that is causing the trouble. It probably defines "clean" and "not clean" in the ext2 specs, but the tool writers just copy that term literally into e2fsck, taking it out of context and confusing the rest of us. They really could do a better job of distinguishing GOOD and TROUBLE in their reports.

Last edited by selfprogrammed; 07-22-2010 at 03:37 PM.
 
Old 07-22-2010, 03:33 PM   #9
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Original Poster
Rep: Reputation: 48
My only problem is this is a machine used by a ton of people for about 20 hours a day, so i dont have room for downtime. I thought when tune2fs reported "not clean" that also means there are filesystem problems? I tried looking through the messages files but they dont go back very far, so who knows if there were errors previously causing it to flip to not clean.
 
Old 07-23-2010, 03:51 PM   #10
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
From what I know, even if you had a file error, it would NOT flip to "not clean", file errors and this flag are unrelated.

They needed a flag to indicate that something was done to the filesystem since the last e2fsck, and that is the name they choose. Probably was thinking of caches, where clean and not clean refer to having to dirty pages and the need to write them back to main memory.

Much of the code has horrible choices for some of the flag names, which barely have meaning in their context, and which can be completely misleading out of context. Usually these flags come from the original spec sheet, where they used whatever term that came to mind, without any consideration to what meaning might be construed out of context. I have even seen obscenities used for a flag (not in Linux), apparently the programmer could not think of any better.

My messages file goes back for 9 months, and will grow endlessly, until it uses all of the disk space, or someone deletes it. Because of this there are periodic message file cleaners
available that copy it to a backup file "messages.bak", and start a new one. You must have one running.

To check for disk errors.
1. Run "badblock". Not recommended on a mounted partition (it may not even allow it), see badblock docs and man page. Be very careful with any partition with any data on it, it can overwrite everything with a test pattern, or it can be told to just read blocks. But your hard drive is invisibly coping with bad blocks already and has spare blocks to use, so this is has little use on modern drives. See the SMART drive report for bad block information.

2. Read all the files with grep or another file checking tool.
>> rgrep "junkjunkjunk" /the-directory

>> diff -r /vol1/thedirectory /vol2/thedirectory 2> error_messages

Let grep or rgrep search the entire directory looking for something.
It will read every file.
If you do not get any error messages, then all is OK.
You can even check the tail of messages afterwards.
>> tail /var/log/messages

3. The problem is that you do not know what a disk or filesystem error message actually looks like. They are not silent, they beep, and you get many horrible long messages from the kernel, C-library, and your program. Even a truncated file will cause nasty messages from the C-library.
The most silent thing that can happen is a file cross linked into another file, and I have only seen that on Win-Dos filesystems.

Last edited by selfprogrammed; 07-23-2010 at 04:08 PM.
 
Old 07-26-2010, 10:34 AM   #11
rjo98
Senior Member
 
Registered: Jun 2009
Location: US
Distribution: RHEL, CentOS
Posts: 1,695

Original Poster
Rep: Reputation: 48
I've used the badblock option in e2fsck before, but that was while the system was in maintenance mode after a restart.
so if i had file system problems, it would only show in the messages files? but would it be consistent enough to where i'd see messages every day?
 
Old 07-27-2010, 02:18 PM   #12
selfprogrammed
Member
 
Registered: Jan 2010
Location: Minnesota, USA
Distribution: Slackware 13.37
Posts: 306

Rep: Reputation: 63
Message file saves the messages for a reasonable time, forever or until your message file cleaner erases the old ones.
This entirely depends on whether you think old errors are relevant after some tool has caught them and dealt with them.

Filesystem problems are caught by e2fsck. What e2fsck does about an error it find depends upon the switches that are used. Find the e2fsck in the rc.d files, and look up the meaning of the switches and what the alternatives could be. This is a reasonable way to learn what e2fsck can do and the options you have.

Unless you really know what you are trying to do, do not change the rc.d file e2fsck settings.

Persistent errors would not go away and would be reported by e2fsck on every boot, so they would show up in the messages repeatedly, therefore they would be easy to find.

If you want to check on drive health then look at SMART. SMART also has background disk verify built-into the disk controller on the drive.
>> man smartctl
This is one command with will give you the SMART error logging registers from the hard drive itself.
This is from memory, so that -L switch is probably wrong, so look in
man smart.
>> smartctl -L /dev/sda

Using smart you can make the drive check itself, either destructively or non-destructively. I did this once and was so nervous about it that I never did it again, but no information was lost from the drive either.
It takes a long time to run, so you will have to come back the next day.
I think a few of the tests can run during normal drive operation too.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Fedora LVM volume group & Physical Volume resize problem gabeyg Fedora 1 05-14-2008 11:26 AM
Volume Control: Intel 82801DB-ICH4 (Alsa mixer) won't save volume levels. cayspekko Linux - Newbie 2 01-31-2008 12:17 AM
unmounted hard disk volume/CD ROM: permission denied tuluks Linux - General 7 12-18-2006 02:17 AM
e2fsck is running for 3 days checking 800 GB ext3 lvm volume kyriakos Linux - Software 8 02-06-2006 02:21 PM
Unmounted Volume, EISA Configuration, 31MB itsjustme Linux - Laptop and Netbook 1 12-10-2003 06:59 PM


All times are GMT -5. The time now is 01:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration