Bash Shell scripting : File and it's metadata - verification
What this question is about :
My goal is to build something that takes two directory roots A and B, then checks that everything that is in A exists in B. If it exists, check that it is identical. To be identical :
If it is identical, delete the one in A (or at least mark for deletion using, say, a log file). It must do this file by file because most likely the system will be reset before it reaches the end of the filesystems. In addition, I would like to offer switches to make it delete from B instead of A, or not delete at all, choose a log file and possibly automatically trying to correct the situation by copying everything in A to B first. The goal is to verify that a mirroring application has done it's job correctly so that I can conlude that I have archived the source copy and it can now be removed. The actual question : I'm currently stuck trying to test if a file date is identical... bash IF has options -ot and -nt for older and newer then comparison but there is no switch to check for identical dates. This is annoying because IF FILENAME -ot OTHERFILENAME is very easy to understand, however if I try to extract the dates and timestamps from a file using another command I will be converting them to strings probably... I'd like to know the cleanest way to compare the dates and avoid any regional/timezone issues. For identicalness I'm using CMP command. For name I'm just using IF exist test. And then there's also an if test for the same node in the filesystem so you can easily see if it's the same file (at least it seems to work, if you think I'm doing it wrong, please tell me!). Along with identical dates, Size and other metadata I don't know how to compare yet either... |
Why do you need to check the modification date?
If two files are otherwise identical, does it really matter? Date/time comparisons are very dodgy unless you know that all times are UTC. A simple ls command gives much metadata such as permissions, user, group, creation date. You could cover much of what you want to do by comparing md5 (or similar) file checksums. |
Because a file date is not stored inside the file and it is useful information. By default,copying sets the date to the date of the copy, which is useless. I want to be able to sort files by date, search for files from a certain period, etc... it's valuable information. About as valuable as the filename.
I don't know if they are all UTC but since source is copied to destination, I can be reasonably certain both are supposed to be in the same timezone. ls only shows unfortunately, and I'd prefer to avoid complicating things by splitting that string by hand...there must be a better way. Checksums are useless in this case, since I'm only checking any particular file exactly once. I would like to generate a checksum though for later use perhaps since I'm deleting the copy which deprives me of the possibility to verify afterwards. In any case, since to generate the checksum you need to read the file once and you need to read it once to do a direct comparison, checksums will only complicate matters. |
1.
Quote:
(very new ext4 based systems do have a new metadata field birthtime, but afaik, no tools actually maintain it yet ...) 2. I agree that we'd like to know what you (OP) are trying to accomplish here; checking every single possible piece of info about a file is possible but generally pointless. The usual definition of 'the same' is identical content, for which a checksum eg md5sum is sufficient. EG what are you going to do if the content is the same but the name is different ... or vice versa .. and similarly for any pair of factoids? Think of all the pair combinations involved. 3. If you insist on checking everything(!), write a program in eg Perl, which will enable you to do so. |
I would tar the two directories and compare the tarballs using cmp.
|
Quote:
Quote:
Quote:
Quote:
To clarify : The purpose is to do an incremental migration, I'm moving all the data of one system to another and i'd like to keep the file system "as is". I wanted to use dd but...do to system limitations this process is always interrupted before completion. In addition I would have to extract the files afterwards anyway from this volume image and go through the validation step anyway. Only I would be using desktop tools then which I don't have available on the source machine. So I was hoping of doing it all in one go and just get the files one by one but with all the meta data synchronized and validated immediately. |
Quote:
|
So your actual requirement has nothing to do with all the testing you are referring to but rather that the data was migrated correctly. Why not then use a tool like rsync which allows you to track
what has been successfully moved, where your up to and I believe it can even start contiguously after an interruption? And seeing as you mentioned it ... are we to assume this work will be done on Windows machines? |
You may as well know that not all the metadata CAN be the same.
In particular, the inode number will almost certainly be different. Depending on the filesystem load, even the "size" of the file will be different (storage requires various metadata to point to the data), and some depend on the filesystems used for the source and destination - the filesystems themselves will have different metadata used. In the case of the "creation date" (actually inode change date) which date do you want? The creation of the original file (which is NOT the creation of the copy...) And if you store two dates for that - guess what, you still won't get the same. In the first case, the creation date and inode modification date might be the same... but the copy won't be. If it were, then the second date would be wrong... For nearly everything, "creation date" doesn't mean anything - which is why when you copy a file you get the date the file started being copied... as that is the "creation date" of the copy. In the second case (a creation date and a copy date) what you get depends on how you copy the file... use an editor to copy a file (it happens), and you don't get the original date, you get the date the file was copied... cp isn't the only way to copy a file, you can also use cat, dd, tr, tar, rsync, cpio, ... and then there are the dozens of programs out there that copy data as well.. |
Quote:
That said, I was interested in rsync but I haven't been able to get it to work. Supposedly my nas supports it, but I doubt my source machine supports it. I forgot what went wrong exactly but maybe I was just doing it wrong.... Quote:
|
Quote:
Quote:
Quote:
Hence, I've been using cp with the switch to preserve dates (-r? -p? I forgot, but it seems to work under the right circumstances). |
It was already mentioned, but probably missed: rsync will do that job for you
|
Quote:
In addition to extents, xfs uses inodes based on the address of where the inode is located. Other metadata existence depends on how the disk is mounted - if acls are enabled then that list can be carried - but if the filesystem either doesn't support (or it is mounted with ACLs disabled) such lists may get dropped. Even then, not all filesystems support the same set of ACLs... those not supported get dropped. Quote:
Users don't set the "creation date"... the system identifies the inode modification date so that a user cannot hide the fact that the file/inode has been modified. Without that restriction anyone would be able to alter a log file... and hid the fact that the log file was altered. Root can in some cases, do that... (using a file system debugger is one - as that allows direct access to the filesystem without going through the system). It is also one reason using dd to make a copy of a filesystem doesn't always work. dd makes a copy all right - but since the filesystem metadata is not modified, some things that NEED to be modified (filesystem labels, UUIDs) don't get changed - thus causing failures on boot as the correct filesystem for a given mount can't necessarily be made. Quote:
A tar file (or cpio) is the most reliable way to preserve the file - it does get the inode modification date, even if it can't restore it. It also has ways of storing other metadata (the extended attributes). It will not preserve the metadata required by a filesystem to maintain the storage of the data (that is irrelevant anyway). |
Quote:
Quote:
|
Quote:
The good news is that these details are not the kind of details that I was aiming to copy. Well maybe the ACL but for my current application it doesn't matter. Quote:
Or perhaps more ambiguously : while moving a car and driving to your location are roughly the same, only driving to a location is considered to really be "making use of your car", moving it to a different parking spot because someone else couldn't get out of their garage is just overhead. So let me specify that when I mention meta data I'm referring to information belonging to the abstract concept of a file and that has nothing to do with the file system it is contained on. I guess this is still debatable but let's say I just want the file (and not inode) modification date, a creation date if it can be found and any alternate data streams if present, filename and the size of the file should be tested to make sure the copy was a success. Quote:
Quote:
Quote:
Quote:
|
All times are GMT -5. The time now is 05:31 PM. |