LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   subtitute pattern that crosses new line (https://www.linuxquestions.org/questions/programming-9/subtitute-pattern-that-crosses-new-line-4175636591/)

vincix 08-17-2018 05:33 PM

subtitute pattern that crosses new line
 
This is snippet of the file I'm trying to change:
Code:

1:      00:02:43:24 00:02:45:22 01:23
Why haven't you ever asked me
Warum hast du mich nie gefragt,

2:      00:02:46:03 00:02:49:04 03:01
what this film is about?
worum es in diesem Film geht?

3:      00:02:53:16 00:02:58:18 05:02
And why haven't I ever told you, anyway?
Und warum hab ich es dir eigentlich nie erzählt?

4:      00:03:02:13 00:03:07:00 04:12
Was it just you not being curious?
Warst du einfach nicht neugierig?

5:      00:03:09:09 00:03:12:00 02:16
Or also me being relieved
Oder war ich einfach erleichtert,

What I'm trying to do is to delete the colon, add a new line and delete the empty space before the time interval, so that the line starts with that line interval, like this:
Code:

1
00:02:43:24 00:02:45:22 01:23

I'd like to do it with sed. This is what I've tried so far:
Code:

sed -r -e '/^[0-9]+:/{N;s/(^[0-9]+):\n\s+/\1\n}' file.txt
I've also tried using \t+ instead of \s+, but to no avail. There's no match, the text doesn't change at all.
On the other hand, I'm not sure using \n on both sides of the substitute sentence is going to work.

I did manage to create a new line after the number with each those lines begin, but I wasn't able to delete the space that preceeds the time interval, so that's not effective:
Code:

sed -r -e '/^[0-9]+:/{N;s/(^[0-9]+):/\1\n/g}' file.txt
1
      00:02:43:24 00:02:45:22 01:23
Why haven't you ever asked me
Warum hast du mich nie gefragt,

Any ideas?

[later edit]
Now I've realised that the initial idea doesn't make sense, as there is no \n in the initial line, so there can't possibly be a match there.

So this seems to be doing what I'm looking for :)
Code:

sed -r -e '/^[0-9]+:/{N;s/(^[0-9]+):\s+/\1\n/g}' file.txt

astrogeek 08-17-2018 06:39 PM

Good work!

You can simplify that a bit:

Code:

sed -r 's/^([0-9]+:)\s*/\1\n/' file.txt

vincix 08-17-2018 06:41 PM

But I'm trying to get rid of the colon :)

astrogeek 08-17-2018 06:45 PM

Quote:

Originally Posted by vincix (Post 5892818)
But I'm trying to get rid of the colon :)

Sorry, I missed that.

Easy to fix, simply move the capture parenthesis:

Code:

sed -r 's/^([0-9]+):\s*/\1\n/' file.txt

vincix 08-17-2018 06:51 PM

I thought \n doesn't work without N. But now I realise that what N does is simply to translate the end of line into \n, so that it can be matched, whereas in your example (and mine too actually), \n is part of the string that substitutes, not the string that is being substituted.

And yes, it doesn't make too much sense to match a string and create a { } sentence if the substituted string is going to be the same with the initial string (the one before the open brace) I'm trying to match. I see what you're getting at. Indeed, much simpler :)

astrogeek 08-17-2018 06:58 PM

Glad that worked!

Equally glad to see that you are trying to understand it all! Very good exercise and time well spent for us both!

vincix 08-17-2018 07:01 PM

Thanks for the help :)


All times are GMT -5. The time now is 10:22 PM.