[SOLVED] separate information with \1 \2 in sed

vincix · 05-02-2017, 03:37 AM

I've got this string:
1d20170324115727p0755600111.m4a

And I'm trying to do this (separate the date from the time of the day (115727) and from the last number by a space, and drop the extension):

Code:

sed -E 's/^[0-9]d([0-9]\{8\})([0-9]+)p([0-9]+)\.m4a$/\1 \2 \3/g'

It returns the initial string. Nothing changes.
Any ideas what's going on?

Turbocapitalist · 05-02-2017, 03:46 AM

There's no -E option in sed so maybe you mean -e instead? While we're looking at options, you might want to use -r to get extended regular expressions. With extended regular expressions the parenthesis and braces do not need to be escaped:

Code:

sed -r -e 's/^[0-9]d([0-9]{8})([0-9]+)p([0-9]+)\.m4a$/\1 \2 \3/;'

If you are preparing to rename some files, you might look at rename instead. It uses perl and thus you get the full flexibility and power of perl regular expressions.

vincix · 05-02-2017, 03:46 AM

After a little bit of testing, I've realised that the problem is related to regex repetition, i.e. \{8\} It doesn't want to make a match and I don't really understand why.
(cross posted)

Turbocapitalist · 05-02-2017, 03:49 AM

In addition to the above, see man 7 regex for the syntax for bounds { } and atoms ()

vincix · 05-02-2017, 03:55 AM

Right, I kept using the mac os style (which needs -E, not -r). I keep getting them confused

So the problem, as you stated, was not only that I didn't use -r for extended regex, but also that I was escaping the curly braces. Before trying 'rename', I'd like to understand what is going on with sed, though. The problem is that \1 still doesn't work, but if simply delete the matched string, it works - that's how I know that the string is matched.

Code:

sed -r 's/^[0-9]d([0-9]{8})//g'

pan64 · 05-02-2017, 03:57 AM

I don't know what is sed -E. From the other hand sed and sed -r have different syntax, probably you mixed them.

vincix · 05-02-2017, 03:59 AM

(sed -E is the same thing as sed -r, but unix-style or mac os style - I'm not sure if it works on all unix-based operating systems. I've already figured that out

)

Turbocapitalist · 05-02-2017, 04:00 AM

The formula in #2 above works for me in sed (GNU sed) 4.2.2

The formula below will save the part within the parenthesis and delete everything else:

Code:

sed -r 's/^[0-9]d([0-9]{8}).*$/\1/;'

The s/// command does not need the g modifier since the substitution only needs to take place a single time. It's not necessary to tell it to go back and look again after the first replacement.

Turbocapitalist · 05-02-2017, 04:02 AM

Quote:

Originally Posted by vincix

(sed -E is the same thing as sed -r, but unix-style or mac os style - I'm not sure if it works on all unix-based operating systems. I've already figured that out

)

Ah. I wish GNU sed's manual page mentioned that. Non-GNU sed's pages do though. Thanks for spotting that.

vincix · 05-02-2017, 04:18 AM

But the problem still persists. \1, \2 still won't work in sed, although the pattern does match - if I simply delete the matched string, it works. So what am I doing wrong?

Turbocapitalist · 05-02-2017, 04:25 AM

Which version of sed are you using? The patterns above work for me in GNU sed 4.2.2 and in OpenBSD's sed for 6.1-current.

pan64 · 05-02-2017, 04:27 AM

this worked for me:

Code:

> echo '1d20170324115727p0755600111.m4a' | sed -r 's/^[0-9]d([0-9]{8})([0-9]+)p([0-9]+)\.m4a$/\1 \2 \3/g'
20170324 115727 0755600111

vincix · 05-02-2017, 04:41 AM

Yes, it does work.
I'm using sed 4.2.2 in Centos 7.3.1611. So I couldn't go more mainstream than that I guess

I might just have got a little bit confused and that's why it probably didn't work. I need to practise more to understand where I have to pay attention more, I guess. I know the basics of regex but it's harder when I actually place them into a context and make them work together, etc.

Thanks both for your help.