Shell script to find duplicate files and old files
Hi All,
I need to clean up our storage. For that, I want to have a shell script to find duplicate file and/or files with modification date more than 2 years. Please help me for this. Regards, Mahesh.S |
Quote:
There are THOUSANDS of easily-found bash scripting tutorials you can find with a brief Google search...many that do things similar to what you're asking. Start there. |
Hi.
Try Fslint http://www.pixelbeat.org/fslint/ and the end of my signature has a link for more specifics at the CLI &c... have fun! :hattip: |
Quote:
Code:
-mtime n |
Welcome to LQ!
Make friends, show us your code. ;) |
As per Sefyir, 'find' for old files.
For 'same' consider whether you mean the same name (need a script) or the same content (try a hashsum eg md5sum) |
Quote:
1.) The "find" command is going to be central to what you're doing. Try: Code:
find <dir-tree-root> -type f -exec md5suum {} \; > checksum.lis 2.) This won't tell you anything about duplicates. A way to find which checksums show up in checksum.lis multiple times try: Code:
cat checksum.lis | awk '{print $1}' | sort | uniq -c Code:
<cat checksum.lis | blah blah | uniq -c> | grep -v " 1 " 3.) Then use Code:
grep -f duplicate_checksums.lis checksum.lis If you plan on having to do this frequently or on multiple systems, it shouldn't be too hard to combine all of the above into a shell script that automates the process. Note 1: Don't forget that you probably want to keep one copy of the duplicate files found above. Note 2: What I tend to do when I embark on a cleanup like this is to wrap the cleanup code with something like: Code:
TESTING="Y" 4.) Extending this to include only files that are 2 years old and older should be fairly simple after you read the find(1) manpage (hint: `-mtime'). Also, be careful not to remove just any files that are older than some arbitrary number of days. You might accidentally clobber application files -- or operating system files -- some of which might be rarely changed and have old datestamps on them. 5.) You have done a backup of the filesystems you're planning on cleaning and know that these backups are correct/usable, right? |
All times are GMT -5. The time now is 02:24 AM. |