LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-06-2010, 01:44 PM   #1
dudulica
LQ Newbie
 
Registered: Nov 2010
Posts: 4

Rep: Reputation: 0
script: rename files by using cksum


Hello,

I have to do a very simple task but being a newbie, it’s becoming a challenge

Here is what I have to do:
1. I have a bunch of *.html files – most of them are duplicates: file1.html, file6.html and file12.html could be duplicates.
2. I have to eliminates the duplicates by using the cksum and rename the files by using the ‘sum’ value (sum.html)

Something like:

$ cksum file1.html
339238769 1918 file1.html
$ mv file1.html 339238769.html

My script should be able to do this for several folders:

../folder1/*.html
../folder2/*.html
.......................
../folder100/*.html

I know it’s a 5 minutes job for an expert but it's taking hours for me to learn shell scripting and have to be done by Monday morning.

Any help is appreciated.

Last edited by dudulica; 11-06-2010 at 04:32 PM.
 
Old 11-06-2010, 01:52 PM   #2
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Let's say that there are 5 identical files... Won't those 5 files have the same cksum value? Therefore, trying to repeatedly move file after file (of those 5) to the same filename will either result in errors (file exists) or overwriting previously renamed files. How do you plan to deal with this event?
 
Old 11-06-2010, 02:23 PM   #3
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Anyhow, despite the potential problem I mentioned in my last post, here's something to try. I haven't tested it, so if it doesn't work, please explain exactly how it doesn't work. This code does not address the possibility of trying to move multiple files to the same filename, and you are expected to put in the correct path to your directory tree that you are searching:
Code:
#!/bin/bash

find /path/to/parent/directory -type f -name "*.html" -exec cksum '{}' \; | sort -k1 | rev | uniq -f2 -d | rev |\
while read SUM SIZE NAME; do
        mv "$NAME" "$(dirname "$NAME")/${SUM}.html"
done
If you have already begun to write the code yourself, it is a good idea to show us what you've got so far, and explain how/why it doesn't work, and allow us to help you diagnose the problem.

Since this appears to be schoolwork (and I commend you for mentioning that) we don't typically do the work for you - you don't learn anything by having others do the work. So, handing in my code without understanding how it works, is asking for trouble.

I encourage you to take the code apart and run the pieces of it individually, beginning with the `find` command (everything up until the pipe into `sort`). See what results come. Then, add the pipe and sort command, and test again. Then add the rev command, and test again, etc., so that you understand what each piece does. Refer to the man pages - for find, rev, sort, and uniq, to understand the options I've used.

If there remains something you do not understand, or this code doesn't work, ask questions or explain the problem. Of course, there are variations on this method of doing what you want, so other members may have different ideas for you to consider.

Good luck!
 
Old 11-06-2010, 02:56 PM   #4
dudulica
LQ Newbie
 
Registered: Nov 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks Celine for the quick reply, I will try it piece by piece.

The script has to eliminate all the duplicated *.html files across several folders and keep only 1 copy.
So if I have let say 5 identical files: it has to keep only 1 file and rename it by using SUM from cksum.

Something like:

Initial:
../dir1/f1.html
../dir1/f2.html
../dir1/f3.html (same as f1.html)
../dir1/f4.html (same as f1.html)

../dir2/f5.html
../dir2/f6.html
../dir2/f7.html (same as ../dir1/f2.html)
../dir2/f8.html

Final result:
keep only f1.html, f2.html, f5.html, f6.html and f8.html, all renamed with the corresponding SUM.html (from cksum).
 
Old 11-06-2010, 03:09 PM   #5
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
OK, the code I provided will not do precisely what you need (see EDIT below!), as it is written. But your further information added, inspires another question..

Let's say that these two files are identical:

folder1/f1.html
folder3/f3.html

Which file gets "kept" and renamed to the SUM.html? Will the final result be:

folder1/SUM.html
folder3/f3.html

or will it be:

folder1/f1.html
folder3/SUM.html

? Or does it matter, as long as *one* of the duplicates gets renamed?

EDIT: I correct myself - the code I gave actually *will* work - it will move exactly one of the dupes, to a new filename $SUM.html, however, the question remains, does it matter exactly which of the duplicated files, gets renamed in its folder? One of the dupes will be found first, and it is likely that this is the very one that will remain at the top of the list of files to be renamed. Is that OK?

Last edited by GrapefruiTgirl; 11-06-2010 at 03:15 PM.
 
Old 11-06-2010, 03:32 PM   #6
dudulica
LQ Newbie
 
Registered: Nov 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks again Celine.

In the initial scenario from my previous post:

I need to mv (or even cp) all the sorted files (no duplicates at all) into a folder = 'resultFolder'. If I do a ls –l from the resultFolder I should get only:

SUM1.html (former f1.html)
SUM2.html (former f2.html)
SUM5.html (former f5.html)
SUM6.html (former f6.html)
SUM8.html (former f8.html)

If I execute let say cksum * from the new 'resultFolder', I shoud only see unique SUM – so no duplicates at all under this result folder.

Thanks again for your time.

Last edited by dudulica; 11-06-2010 at 03:58 PM.
 
Old 11-06-2010, 03:32 PM   #7
GrapefruiTgirl
LQ Guru
 
Registered: Dec 2006
Location: underground
Distribution: Slackware64
Posts: 7,594

Rep: Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551Reputation: 551
Heh, edit again!

Remove the -u option from `uniq` and it should do what you like. (Evidently I am finding this a confusing issue )

Oh, I see you need a new target folder now.. No problem, let's see:
Code:
#!/bin/bash

find /path/to/parent/directory -type f -name "*.html" -exec cksum '{}' \; | sort -k1 | rev | uniq -f2 | rev |\
while read SUM SIZE NAME; do
        mv "$NAME" "TARGET-FOLDER/${SUM}.html"
done
Try that while putting your real TARGET-FOLDER into the destination for the mv command. (You can change the `mv` for a `cp` if you like - that way we can run this again on the same source tree without the files not being there)

Last edited by GrapefruiTgirl; 11-06-2010 at 03:35 PM.
 
Old 11-06-2010, 03:33 PM   #8
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,417

Rep: Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974
Can I mention the hardlink tool? Not what you're asking for, but will automatically find identical files and link them to the same piece of disk, which is half of the issue you've got.
 
Old 11-06-2010, 04:37 PM   #9
dudulica
LQ Newbie
 
Registered: Nov 2010
Posts: 4

Original Poster
Rep: Reputation: 0
Thanks Celine (and Chris as well)!

It works perfect.
I will do a little bit more testing and ... learning and will mark my newbie challenge as solved in a day or two.

Cheers!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble with making a bash script to read in different files and rename output files. rystke Linux - Software 1 05-07-2009 09:00 AM
To rename files in a directory should I use Bash script or a Perl Script ? jamtech Programming 7 01-23-2008 12:25 AM
Rename files with script sharathkv25 Programming 14 06-25-2007 04:00 AM
Script to rename files nazs Programming 15 03-31-2007 03:12 PM
Script to Rename Many Files geeman2.0 Programming 3 04-05-2006 02:45 PM


All times are GMT -5. The time now is 07:14 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration