LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   SED assistance: add character in the middle of a string. (http://www.linuxquestions.org/questions/programming-9/sed-assistance-add-character-in-the-middle-of-a-string-839311/)

Vryali 10-20-2010 10:37 AM

SED assistance: add character in the middle of a string.
 
I resolved the problem after trying in vain to make sed work as I wanted it to, but my problem was this:

I have a folder of MP3s, where some have <artist>- <song> and others have <artist> - <song>. I wanted to make all of them say <artist> - <song>

The command that works for me was
for i in *[a-Z]-*; do mv "$i" "`echo $i | sed 's/-/ -/g'`"; done

but if I -hadn't- used ls with my parameters to filter out the valid results, how would I have done this with sed? I got close with:

ls | sed -e 's/\(.*\)[a-Z]-\(.*\)/\1 - \2/'

but with the above I lose the [a-Z] character that I used to match with (and I need to keep that). My basic question is what I needed to do with the above command to make it work as expected within sed (again, I've since solved it, I'm just trying to understand sed bettter)?

Thanks in advance for your time and responses =)

druuna 10-20-2010 11:34 AM

Hi,

[a-Z] is not a legal range, although sometimes accepted. Sed does not accept it.

You do mention that you need/want to keep the [a-Z] part, which would limit your options considerably!!

Why not something like this: ls | sed 's/[[:blank:]]*-[[:blank:]]*/ - /g'

This looks for a dash (-) that has zero or more blanks in front and after it and changes this into a space a dash and a space (globally). I intentionally used [:blank:] to include all blanks and not just a space.

Hope this helps.

David the H. 10-20-2010 11:36 AM

You simply need to make sure that every character you want to save is inside back-referenced parentheses. In this case, just expand the first set to included the a-Z reference.

It's also easier when dealing with regex to use the -r option. That way you don't have to backslash-escape everything.
Code:

ls | sed -r -e 's/(.*[[:alnum:]])- (.*)/\1 - \2/'
Notice that I added a space after the hyphen too. Without it, a filename like artist- hyphenated-song would break at the wrong place, because regex is greedy.

Something like this would be even better. It can be run on both correct and incorrect names, as long as there's a space after the hyphen in the middle.
Code:

ls |sed -r -e 's/([^- ]+)- (.*)/\1 - \2/'
Speaking of which, in your first expression, using the 'g' flag in sed would also affect hyphenated names.

Finally, instead of using sed, ls, and/or loops, I recommend perl rename, a convenient renaming script included in some distro's perl implementations. It uses the same general sed/perl syntax as above and works with standard shell file globbing. A stand-alone version is available here:

http://tips.webdesign10.com/files/rename.pl.txt

Vryali 10-21-2010 07:48 AM

Thanks to both of you for your responses. I didn't realize I could use blank like that and wasn't aware of the -r flag, which will certainly make things easier going forward :)

I'll check out the perl mention, but I think the closest answer to the method I was using is:

Code:

ls |sed -r -e 's/([^- ]+)- (.*)/\1 - \2/'
I've never seen/used the plus operator before, but a google said:
Quote:

The plus operator will match the preceding pattern 1 or more times.
Is the + actually needed? I ran it with and without and the results seemed to be the same (Also, the + seems to be functionally the same as using /g, is that a correct correlation)? It looks like, at a glance, the ^ means the expression will only match once regardless of the + (making it unnecessary)?

Thanks again, I'll add some reputation as soon as I figure out how to do it XD

David the H. 10-21-2010 10:51 AM

Good catch. No, you don't really need the plus sign there. I put it in out of habit because that pattern is often used to stop a regex pattern from being greedy.

When ^ is at the first position inside brackets it negates the character range, so [^- ] means to match anything that's not a hyphen or a space.

For that matter, you'd probably really only need to negate the space here ([^ ]). It really depends on how careful you need to be in weeding out false matches. You could even use a really simplified version like this if there's no chance of there being multiple not-space+hyphen+space combinations.
Code:

ls | sed -r 's/([^ ])- /\1 - /'
Don't get confused between * and + and the sed "g" option. The first two are part of the regex expression, meaning to match zero/one or more of the previous character/pattern. But the "g" is sed's "global" match command, which means that it will apply the changes at every place that the regex matches in the input string, instead of just the first instance without it.


All times are GMT -5. The time now is 12:45 PM.