Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I have two volumes, both with 800GB total used on them. lets call them /vol1 and /vol2. /vol2 is just a cron'd rsync'd copy of a folder on /vol1 which is a live share for many users.
If I temporarily suspend the cronjob doing the rsync's from /vol1 to /vol2, is it safe to unmount and e2fsck /vol2, then remount it somehow?
Both /vol1 and /vol2 say the filesystem state is not clean when i do a tune2fs -l on them both. According to tune2fs both will check themselves upon restart, but if I can do /vol2 since it isn't the live data beforehand, that will cut my downtime in half the next time i restart the server.
But I also wonder that if I can do this, then i remount /vol2, will the "not clean"-ness of /vol1 just be rsync'd back over to /vol2 the next time the rsync runs?
I do not know rsync, but that "not clean" part sounds like it needs attention.
Did your system ever go down on a powerfail ?
Does this system ever get rebooted, and does it check clean at that time?
Does issuing a sync command change the message from tune2fs ?
Unless this is part of a raid system, then the handling of a copy from the one drive to another should not copy errors, unless it cannot read the source then it will not do the copy of that file. They would have to be doing a disk sector copy to do that. If is only copies one folder, then it hardly could be a sector copy.
I suspect that rsync just traverses the directories looking at the timestamps and copying any file thats newer than the /vol2 copy.
If that is so, then you can safely stop the cron job, do your e2fsck
and restart the cron job (type it in by hand if you have to).
The rsync will just have to catch up when it notices the timestamps.
Check the rsync docs for what it does the first time it is started
on such a directory, and that should give you an answer.
I did a test on my system (which gets shut down at night).
Immediately after boot, doing "tune2fs /dev/sda7 -l" gives me a
state that is "clean".
I start one user and copy one file, then the tune2fs gives me a
state of "not clean"
I ran sync, and it did not change.
Apparently they mean by "not clean" is that something was written since the last e2fsck, and sync is not good enough.
Thus your "not clean" is perfectly normal and it only means that at least one file was written since the last e2fsck.
Enough digging in the ext2 docs might have revealed this, but they are not great about putting such info where you can find it when needed.
Last edited by selfprogrammed; 07-22-2010 at 03:58 PM.
I did google of Rsync and found wikipedia.com/wiki/Rsync, which has a tutorial. There are two ways of using rsync, periodic cron job, and as a deamon.
From the wiki description, it is scanning one directory and transmitting delta information to another rsync. This would NOT copy ext2 errors, but would copy file errors.
It looks to be interruptable. To restart, just do one rsync operation over the directory and let the cron job go periodic again.
From the wiki description it does intend to work on existing directories and thus must be able to catch up if restarted.
It is best to shut it down cleanly, however that is done, instead of
killing the jobs and leaving half written files on /vol2. However, as they are just files, the next rsync will find that they do not match (by checksum) and then will send a diff file to correct it. See Wiki.
As I do not use it myself, this is the best I can do from general principles. It may be that no one else has documented this, it is hard to find anyone using a pkg in an odd way.
Last edited by selfprogrammed; 07-22-2010 at 04:27 PM.
To catch drive errors, make sure the S.M.A.R.T capability is enabled in your kernel (check dmesg boot info). That is specific disk drive error detection and correction. Most any new SATA drive ought to be SMART capable.
>> less /var/log/dmesg
and eyeball it, the spelling is sometimes different.
To catch filesystem errors check the e2fsck report at boot. That is specific filesystem error detection. Also found by checking dmesg.
>> grep -C4 "e2fsck" /var/log/dmesg
You would have to be getting powerfail, crashing kernel, or a buggy motherboard to get disk errors or filesystem errors these days using Linux. If e2fsck is not finding any at boot, and you are not noticing anything else weird, then trust it. Its the way of presenting information ("not clean") to the public that is causing the trouble. It probably defines "clean" and "not clean" in the ext2 specs, but the tool writers just copy that term literally into e2fsck, taking it out of context and confusing the rest of us. They really could do a better job of distinguishing GOOD and TROUBLE in their reports.
Last edited by selfprogrammed; 07-22-2010 at 04:37 PM.
My only problem is this is a machine used by a ton of people for about 20 hours a day, so i dont have room for downtime. I thought when tune2fs reported "not clean" that also means there are filesystem problems? I tried looking through the messages files but they dont go back very far, so who knows if there were errors previously causing it to flip to not clean.
From what I know, even if you had a file error, it would NOT flip to "not clean", file errors and this flag are unrelated.
They needed a flag to indicate that something was done to the filesystem since the last e2fsck, and that is the name they choose. Probably was thinking of caches, where clean and not clean refer to having to dirty pages and the need to write them back to main memory.
Much of the code has horrible choices for some of the flag names, which barely have meaning in their context, and which can be completely misleading out of context. Usually these flags come from the original spec sheet, where they used whatever term that came to mind, without any consideration to what meaning might be construed out of context. I have even seen obscenities used for a flag (not in Linux), apparently the programmer could not think of any better.
My messages file goes back for 9 months, and will grow endlessly, until it uses all of the disk space, or someone deletes it. Because of this there are periodic message file cleaners
available that copy it to a backup file "messages.bak", and start a new one. You must have one running.
To check for disk errors.
1. Run "badblock". Not recommended on a mounted partition (it may not even allow it), see badblock docs and man page. Be very careful with any partition with any data on it, it can overwrite everything with a test pattern, or it can be told to just read blocks. But your hard drive is invisibly coping with bad blocks already and has spare blocks to use, so this is has little use on modern drives. See the SMART drive report for bad block information.
2. Read all the files with grep or another file checking tool.
>> rgrep "junkjunkjunk" /the-directory
Let grep or rgrep search the entire directory looking for something.
It will read every file.
If you do not get any error messages, then all is OK.
You can even check the tail of messages afterwards.
>> tail /var/log/messages
3. The problem is that you do not know what a disk or filesystem error message actually looks like. They are not silent, they beep, and you get many horrible long messages from the kernel, C-library, and your program. Even a truncated file will cause nasty messages from the C-library.
The most silent thing that can happen is a file cross linked into another file, and I have only seen that on Win-Dos filesystems.
Last edited by selfprogrammed; 07-23-2010 at 05:08 PM.
I've used the badblock option in e2fsck before, but that was while the system was in maintenance mode after a restart.
so if i had file system problems, it would only show in the messages files? but would it be consistent enough to where i'd see messages every day?
Message file saves the messages for a reasonable time, forever or until your message file cleaner erases the old ones.
This entirely depends on whether you think old errors are relevant after some tool has caught them and dealt with them.
Filesystem problems are caught by e2fsck. What e2fsck does about an error it find depends upon the switches that are used. Find the e2fsck in the rc.d files, and look up the meaning of the switches and what the alternatives could be. This is a reasonable way to learn what e2fsck can do and the options you have.
Unless you really know what you are trying to do, do not change the rc.d file e2fsck settings.
Persistent errors would not go away and would be reported by e2fsck on every boot, so they would show up in the messages repeatedly, therefore they would be easy to find.
If you want to check on drive health then look at SMART. SMART also has background disk verify built-into the disk controller on the drive.
>> man smartctl
This is one command with will give you the SMART error logging registers from the hard drive itself.
This is from memory, so that -L switch is probably wrong, so look in
>> smartctl -L /dev/sda
Using smart you can make the drive check itself, either destructively or non-destructively. I did this once and was so nervous about it that I never did it again, but no information was lost from the drive either.
It takes a long time to run, so you will have to come back the next day.
I think a few of the tests can run during normal drive operation too.