Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
on an average 2.5 gb is the size of the directory. The above mentioned is done because many user are doing changes in the file(different copies of same directory) on trial and error method. so at last we can only check changed files.
Can we do this crc or hash table thing in Perl and can i invoke it through the bash script???
Will it be easy in PERL???
The option 2 suggested by you is good!!! but it won't consume more time???
can you suggest any other command like cksum it will do my work or i have to write a script for it.
What about the hash tables and how i can create hash tables in linux for the above mention LQ.
Item 2 is designed to run in the shortest time, assuming few files have changed. The loop on file names would be required for all solutions. Getting a file's mtime is very cheap/quick compared to computing its checksum.
SHA1 sums are considered more robust than MD5 sums but AFAIK take longer.
Out of interest, can I ask why you want to cksum 25K+ files in a single directory? Seems like you have given yourself a mountain to climb. I would split them into sub-directories, then monitor changes in said sub-directories with something like this:
for DIR in `ls -d *`;do echo $DIR;tar cf - $DIR | cksum; done
You could even run something similar in parallel to get going quicker.
IDK (see Internet Slang Dictionary!) about Perl's checksumming facilities but, if they do exist which is likely, they will take around the same resources as native GNU/Linux commands like cksum.
Wikipedia is good for the likes of MD5 and SHA1. They are the checksums most commonly in use.
As already stated, checksumming is inherently resource-intensive. The people who wrote the various utilities available will have tried to do a good job so all of them will have much the same performance.
As suggested by Ginola, if i do the same using fork , so that processes will simultaneously do cksum on subfolders into the main folder, will it decrease the processing period?? i have not tried this thing..
rsync uses a lightweight crc, so I would say it would be faster than sha1 or md5 (they are both "high precision" checksums). The only additional requirement is to have an original set of files (to compare with) Rsync will compare the two directory trees and find all the differences (and also it can sync them).
running cksum simultaneously will speed up the whole process, but also will increase the load of the system (probably you cannot start 25k processes in the same time, you need to organize it somehow).
You can definitely multi-process in eg Perl, but have you considered using a code ctrl system, so that you will get a record each time a file is changed, instead of having to post process all 25k files eg once a day.
Also, as above, checking mtimes will be (much) faster than checksumming file contents.
pan64 , rsync would check the differences in the two directory trees, but my application needs that some values or marks like checksum would be present for the original as well as the updated version. so that it can generate a report that there is a change in data of file. Also the no of versions are >10.
chris, checking mtime only is not satisfying the desired.
Can any one suggest how to organize the sub folders so that it can be used with fork. Any probable errors while using fork??