LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 06-28-2004, 07:36 PM   #1
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Rep: Reputation: 30
Massively renaming numerous similar files


I am trying to read the man pages of gawk and sed to rename a whole slew of files, but I am still having trouble. The problem is that I am a little confused on the syntax. Basically, I have a whole bunch of *.r## and *.p## files, where ## are two digit numbers. I want to change just a few of the characters in the original filenames. See below for example:

This is what they look like:
--snip--
$ ls -a *.p* && ls -a *.r*
smr_DB-ts_1of2_.p01 smr_DB-ts_1of2_.p05 smr_DB-ts_2of2_.p03
smr_DB-ts_1of2_.p02 smr_DB-ts_1of2_.par smr_DB-ts_2of2_.p04
smr_DB-ts_1of2_.p03 smr_DB-ts_2of2_.p01 smr_DB-ts_2of2_.p05
smr_DB-ts_1of2_.p04 smr_DB-ts_2of2_.p02 smr_DB-ts_2of2_.par
smr_DB-ts_1of2_.r00 smr_DB-ts_1of2_.r16 smr_DB-ts_2of2_.r08
smr_DB-ts_1of2_.r01 smr_DB-ts_1of2_.r17 smr_DB-ts_2of2_.r09
smr_DB-ts_1of2_.r02 smr_DB-ts_1of2_.r18 smr_DB-ts_2of2_.r10
smr_DB-ts_1of2_.r03 smr_DB-ts_1of2_.r19 smr_DB-ts_2of2_.r11
smr_DB-ts_1of2_.r04 smr_DB-ts_1of2_.r20 smr_DB-ts_2of2_.r12
smr_DB-ts_1of2_.r05 smr_DB-ts_1of2_.r21 smr_DB-ts_2of2_.r13
smr_DB-ts_1of2_.r06 smr_DB-ts_1of2_.r22 smr_DB-ts_2of2_.r14
smr_DB-ts_1of2_.r07 smr_DB-ts_1of2_.rar smr_DB-ts_2of2_.r15
smr_DB-ts_1of2_.r08 smr_DB-ts_2of2_.r00 smr_DB-ts_2of2_.r16
smr_DB-ts_1of2_.r09 smr_DB-ts_2of2_.r01 smr_DB-ts_2of2_.r17
smr_DB-ts_1of2_.r10 smr_DB-ts_2of2_.r02 smr_DB-ts_2of2_.r18
smr_DB-ts_1of2_.r11 smr_DB-ts_2of2_.r03 smr_DB-ts_2of2_.r19
smr_DB-ts_1of2_.r12 smr_DB-ts_2of2_.r04 smr_DB-ts_2of2_.r20
smr_DB-ts_1of2_.r13 smr_DB-ts_2of2_.r05 smr_DB-ts_2of2_.r21
smr_DB-ts_1of2_.r14 smr_DB-ts_2of2_.r06 smr_DB-ts_2of2_.r22
smr_DB-ts_1of2_.r15 smr_DB-ts_2of2_.r07 smr_DB-ts_2of2_.rar
--snip--

I want the above to be transformed into this below (shortened, but you gewt the idea that I want to stick in the parentheses where necessary on every file and take out unnecessary characters):
--snip--
(smr)DB-ts(1of2).p01
(smr)DB-ts(1of2).p02
...
(smr)DB-ts(1of2).par
(smr)DB-ts(2of2).p01
(smr)DB-ts(2of2).p02
...
(smr)DB-ts(2of2).par
(smr)DB-ts(1of2).r00
(smr)DB-ts(1of2).r01
...
(smr)DB-ts(1of2).rar
(smr)DB-ts(2of2).r00
(smr)DB-ts(2of2).r01
...
(smr)DB-ts(2of2).rar
--snip--

Any ideas? Would gawk or sed be better for this purpose? I would like to do this directly from the command line. Thanks!

Kristian Hermansen
 
Old 06-28-2004, 08:36 PM   #2
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
You might want to check out the rename command: man rename

You would probably have to do multiple runs to adjust each piece, but it can do what you want.
 
Old 06-28-2004, 10:56 PM   #3
mikshaw
LQ Addict
 
Registered: Dec 2003
Location: Maine, USA
Distribution: Slackware/SuSE/DSL
Posts: 1,320

Rep: Reputation: 45
You do realize that renaming multi-part RAR and PAR files will make the contents of the archive inaccessible?
 
Old 06-28-2004, 11:40 PM   #4
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
Quote:
Originally posted by mikshaw
You do realize that renaming multi-part RAR and PAR files will make the contents of the archive inaccessible?
The problem is that they seem to have been renamed by the program that downloaded them, and thus the PAR files could not recover the RAR files since the names had changed! It looks as if somewhere along the line the parentheses "(" and ")" were not escaped correctly when writing the file, and must have defaulted to "_" because of this. I just wanted to know how to rename a whole bunch of files, and this is actually more of a general question than a specific one. I'm going to check out the rename command right now :-)

Kristian Hermansen
 
Old 06-29-2004, 12:22 AM   #5
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
I dont think that rname command is what I am looking for. It needs to be a bit more complex than this, and keep the same basic filename structure with other characters interspereds - which is why I thought maybe sed or gawk might be useful here. Any ideas?

Kristian Hermansen
 
Old 06-29-2004, 12:56 AM   #6
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Yes, rename will work:

Code:
rename smr \(smr smr*
rename smr_ smr\) *smr*
rename ts_ ts\( *ts_*
rename _.p \).p *_.p*
rename _.r \).r *_.r*
Those five commands will change every file you listed in the example to match the output you were looking to get.

Last edited by Dark_Helmet; 06-29-2004 at 12:57 AM.
 
Old 06-29-2004, 01:01 AM   #7
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
Quote:
Originally posted by Dark_Helmet
Yes, rename will work:

Code:
rename smr \(smr smr*
rename smr_ smr\) *smr*
rename ts_ ts\( *ts_*
rename _.p \).p *_.p*
rename _.r \).r *_.r*
Those five commands will change every file you listed in the example to match the output you were looking to get.
That's cool, thanks for the tip! Do you also know how to do it with a one line regex command?

Kristian Hermansen
 
Old 06-29-2004, 01:09 AM   #8
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Not meaning to be overly nosy, but why does it need to be a one-line regex? You could just as easily put those rename commands into a script if you wanted a single command.
 
Old 06-29-2004, 01:27 AM   #9
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
Quote:
Originally posted by Dark_Helmet
Not meaning to be overly nosy, but why does it need to be a one-line regex? You could just as easily put those rename commands into a script if you wanted a single command.
The problem is that I would be invoking the rename command once for EVERY instance of something I wanted to replace. This can be costly when you're dealing with millions of files in a database that you want to update immediately given some dynamically changing criteria. I want to be able to do this more efficiently, since calling the same progam numerous times on the same file would be unnecessary. What if I needed to make one hundred changes, should I call rename 100 times? You can see that a quick regex on a vast array of files might be more inexpensive (although is some cases probably not, given the regex complexity). Also, the scripting will be much easier if I can say something like "sed s/foo_/foo\)+ blah blah blah" rather than "rename foo_ foo\) foo* && rename blah && rename blah && rename blah && ..." I would also like to know how the equivalent is handled with regexp, since I'm not much of a regex whiz...lol

Kristian Hermansen
 
Old 06-29-2004, 02:39 AM   #10
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
I would avoid a single regex expression at all costs (for maintainability reasons) and break it up to something like this:
Code:
for file in *
do
  mv -v ${file} `echo ${file} | sed s/^smr/\(smr/ | sed s/smr_/smr\)/ | ... `
done
If you absolutely must have a one liner, which I'm still not completely convinced is necessary, you could try this:
Code:
for file in *
do
  mv -v ${file} `echo ${file} | sed "s/^smr_DB-ts_\([0-9]\)of\([0-9]\)_\.\([rp]\)/\(smr\)DB-ts\(\1of\2\).\3/"`
done
Like I said, trying to shove it all into one regex might create a maintenance nightmare.
 
Old 06-29-2004, 08:08 AM   #11
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
Quote:
Originally posted by Dark_Helmet
I would avoid a single regex expression at all costs (for maintainability reasons) and break it up to something like this:

Like I said, trying to shove it all into one regex might create a maintenance nightmare.
Maybe you are correct :-) When would you choose to use regex over the multiple rename commands?

Kristian Hermansen
 
Old 06-29-2004, 02:22 PM   #12
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
For this particular situation, I don't think the speed is necessary for a couple reasons:[list=1][*]the archives are inaccessible[*]the number of files comprising an archive[/list=1]
Assuming mikshaw is correct in saying you trash a multi-part rar archive by changing the names of each part, then your archives are toast right now. It doesn't matter if they're in a database or not. A user might be able to query for a list of files that comprise one archive, but they can't do anything to them until thy're renamed, right? So there's no change in the situation by performing multiple renames. I might have to do five commands, but the archives can't be any more inaccessible if they're incorrectly named. What I'm getting at is "smr_DB-ts_2of2_.rar" is still just as inaccessible as "(smr_DB-ts_2of2_.rar".

Given the costraints of the filenames, you've said that an archive cannot have more than 100 constituent parts: .rar, .p01, .p02, ... , .p99. Unless there's something about the data files that I'm not aware of, then it should be possible to rename these individual archives one-at-a-time. That is, rename smr_DB-ts_1of2_.*, then rename smr_DB-ts_2of2_.*, etc. I have to assume that one entire archive is distinct and independent of the other archives, meaning that this collection of archives can sustain one archive changing its name at a time. In that case, you're no longer talking about speed performance for "millions" of files, but 100. The performance difference between issuing 5 renames and one big sed is severely diminished on a data set of 100 files versus 1,000,000.

Now, if you had a single archive of some unbelievably large number of pieces (1,000+ is a nice arbitrary number), the data is currently "live" (accessible by users), and it must stay live, then I would look into making some sort of single command to handle the name changes. Actually, I'd probably try to arrange a shutdown time first (30 minutes would probably be more than enough) before going to a complicated regex
 
Old 06-29-2004, 02:34 PM   #13
khermans
Member
 
Registered: Sep 2001
Distribution: Ubuntu, Debian, Gentoo
Posts: 162

Original Poster
Rep: Reputation: 30
Good discussion. You are correct that having many more files would show the performance increase, and that for this example it is not pertinent. I did also mention at the beginning of the topic that this was a general case and that the specific example was just for clarity.

Also, the multi-part rar's still extract fine whether or not the name is changed, as long as ALL the files are changed to reflect the new format. The problem was actually that the PAR (parity) files look for specific file names and that they were not finding them! I did use multiple rename commands to test this and eveything worked great! Thanks for your help :-)

Kristian Hermansen
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
bash help renaming files kahn Programming 6 06-16-2005 07:15 AM
renaming batch of files linux_ub Linux - Newbie 6 10-27-2004 09:41 PM
renaming files script. xushi Programming 4 10-10-2004 08:06 AM
Renaming files in one go saurya_s Linux - Software 1 01-12-2004 01:16 PM
File Systems and Numerous files per directory dman65 Linux - General 1 10-06-2003 04:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration