LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Rename files by stripping text from either desired string? (http://www.linuxquestions.org/questions/linux-general-1/rename-files-by-stripping-text-from-either-desired-string-621878/)

varney 02-17-2008 07:53 PM

Rename files by stripping text from either desired string?
 
Hi

I just can't get my head round this regexp thing. :confused:

Basically, I have a folder full of files, and i want to rename them. Each file has a particular string in it (of a particular format) that i want to isolate, but strip text from either side, and keep a bit of the text plus an extension at the end.

Consider this:

Code:

filenamestartstringidontwant.x1y1.deletethisbitalso.keepthispartandthe.ext
i hope you can follow that. i want to strip the start, then keep the x1y1 (but this changes depending on the file!), remove a bit of text after this and then keep the next part plus the extension.

there is probably an easier way of explaining how to do what i need, but i just wanted to be clear (or try to be).

basically, all the files should end up after rename as

Code:

x1y1.keepthispartandthe.ext
Btw, the dots you see in the filename don't necessarily have to exist, except for this bit which is absolute: .keepthispartandthe.ext, and must have the dots where they are.

hope that clears it up for anyone who can help!

thanks ;)

sundialsvcs 02-17-2008 09:27 PM

Okay, so what you are saying seems to be this:
  1. "The part that you are interested in" consists of: one-or-more 'word characters' (alpha, numeric, or underscore), followed by a literal period, followed by one-or-more word characters, all anchored at the end-of-line.
  2. To put it another way, the "right way" to find "the part that you are interested in" would be to examine the string from right to left. The pattern you're looking for always ends at the end of the string, extending leftwards past the extension and the obligatory period, thence only to the extent of the rightmost group of word-characters thereafter. Anything (to the left) beyond that is uninteresting.
In that case, look at this regular-expression:
Code:

/(\w+\.\w+)$/
  • The pattern describes what you do want, not the cruft which may surround it.
  • The entire regular-expression is, by convention, enclosed in slashes, which are not part of the pattern.
  • The trailing '$' at the end of the pattern denotes the end-of-string: this pattern always consists of the rightmost characters in the string.
  • Parentheses "()" are used to enlose a group, which will allow you to extract that group from the string: once you've found that a string matches the pattern, you can extract what substring matched each group.
  • "\w" is a metacharacter for "word character." It represents a predefined group of characters as previously described, corresponding to "[A-Za-z0-9_]"
  • "\." refers to a literal period.
  • "+" is a repetition factor, in this case "one or more of."
  • So the pattern within the group can be read as, "one or more of word-characters, followed by one period, followed by one or more of word-characters, followed by the end of the string."

varney 02-21-2008 10:04 PM

A very in-depth reply, I must say, and thank you also for explaining each part of the expression. Problem is, I have never had to deal with regular expressions in my lifetime of using Windows; I only recently got a dedicated linux box - so would you mind giving me an example of how i would rename the files using the regexp you suggest?

thanks :)


All times are GMT -5. The time now is 06:51 PM.