Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
what is the best way to compare the smalll files from 2 dirs, each have 900 Gb of data. In windows, i use "beyond compare", but it cannot handle the large number of files. Anyone know any good tools or how to compare them, please help.
What is it about them you want to compare?
What is the aim of the comparison?
i.e. you want to make sure the most up-to-date version of each file is in one (or both) directories? You want to quick-check if there are files in one directory that are not in the other? You want to make a log of differences in content between pairs of text files with the same names? You want to see if files with different names are, in fact, identical in content? See what I mean?
Usually, large batch jobs in linux are handled with scripts.
The diff utility is often used to compare text between two files.
Perhaps you could produce 2 md5sum lists, one for each directory and then compare the sums. Using the "find" command to list the files with an "-exec md5sum '{}' \; >md5sumlist" argument, and redirecting the output to a file, you can option a table of files & sums that you can use to locate duplicate files. This can be used to find duplicates when you don't know where on the filesystem a duplicate might be found.
You might consider organizing your data better so that you don't have so many files in each directory. You won't be able to use fileglobbing in these directories because the list would be too large to pass as arguments to a command. You will often need to resort to "find" and "xargs" so you can limit the number of arguments handled at a once.
hmm. md5sum sound good. I will use the find to list all the files and md5sum each file. That would work.. Thanks
I was transfering data from one drive to the other, and want to make sure that everything are copied correctly. I used rsync to verify, but it crash for some reason.
Go look at md5deep. You can produce a file of md5sums, from one directory. You can feed that into md5deep, pointed at the other directory, and it will spit out only the files that are different.
I've used it to verify that a copy of a complete disk was identical to the original.
If the directory structures are the same on the two drives, running the find command in the corresponding directory of each drive will produce two lists that should be identical. You could simply use "diff" or sort both lists and use "comm -13 md5sumlist1 md5sumlist2" to find altered files from the 2nd list.
E.G. comm -13 <(sort md5sumlist1) <(sort md5sumlist2)
or
sort md5sumlist1 >md5sumlist1.sorted
sort md5sumlist2 >md5sumlist2.sorted
comm -13 >altered_files_in_list2
Go look at md5deep. You can produce a file of md5sums, from one directory. You can feed that into md5deep, pointed at the other directory, and it will spit out only the files that are different.
How exactly does one create an md5sum list? I haven't had much luck figuring that out.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.