LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Regex: Put + Between Paragraphs (https://www.linuxquestions.org/questions/linux-newbie-8/regex-put-between-paragraphs-4175675081/)

blueray 05-12-2020 09:41 AM

Regex: Put + Between Paragraphs
 
I need a regex one liner for the following problem.

If There is multiple paragraph after a Line which has ::, Then have to put + between each paragraph.

Current Text

Code:

One dollar:: and eighty-seven cents. That was all. And sixty cents of it was in pennies.

Thee:: It was easy to spot her. All you needed to do was look at her socks.

One would reach her knee while the other barely touched her ankle.

While the argument:: seems to be different the truth is it's always the same.

They both knew it, but neither has the courage or strength.

The words:: hadn't flowed from his fingers for the past few weeks.

He didn't understand why he couldn't even type a single word.

Was being satisfied enough?

She reached her goal:: exhausted. Even more chilling to her was that the euphoria that she thought she'd feel.

Spending time at national parks can be an exciting adventure.

It seemed like it should have been so simple.

Was it enough:: That was the question he kept asking himself.

He knew that he was satisfied and he also knew it wasn't going to be enough.

It was just a burger. Why couldn't she understand that?

Yes, he had promised her and yes, he had broken that promise.

Expected Output

Code:

One dollar:: and eighty-seven cents. That was all. And sixty cents of it was in pennies.

Thee:: It was easy to spot her. All you needed to do was look at her socks.
+
One would reach her knee while the other barely touched her ankle.

While the argument:: seems to be different the truth is it's always the same.
+
They both knew it, but neither has the courage or strength.

The words:: hadn't flowed from his fingers for the past few weeks.
+
He didn't understand why he couldn't even type a single word.
+
Was being satisfied enough?

She reached her goal:: exhausted. Even more chilling to her was that the euphoria that she thought she'd feel.
+
Spending time at national parks can be an exciting adventure.
+
It seemed like it should have been so simple.

Was it enough:: That was the question he kept asking himself.
+
He knew that he was satisfied and he also knew it wasn't going to be enough.
+
It was just a burger. Why couldn't she understand that?
+
Yes, he had promised her and yes, he had broken that promise.

The solution I have tried is

Code:

$ perl -pe 's/(^.*::.*\n\n)/$1\n+\n/g' regex.txt
However, It only put + after the first paragraph.

TB0ne 05-12-2020 09:51 AM

Try this:
Code:

sed '/\:\:/{N;s/\n$/\n+/}'

blueray 05-12-2020 10:00 AM

Please let me get back to you. It might take an hour.

Turbocapitalist 05-12-2020 10:07 AM

Or try reversing the pattern and changing the input record separator to something other than a new line.

Code:

perl -0x1ff -pe 's/\n\n(?!.*::)/\n+\n/g'
See "man perlrun" and "man perlre"

That leaves a trailing plus on the last line, however.

That might not be the most practical with very large files. You might need a more complex one-liner or even something more than just a one-liner.

Edit:

Code:

perl -0x1ff -pe 's/\n\n(?!.*::)(?=.)/\n+\n/g;'

blueray 05-12-2020 10:29 AM

Thank you very much.

Turbocapitalist 05-12-2020 10:33 AM

No problem. There is one negative lookahead (?!…) assertion and one positive (?=…) assertion. They are useful on occasion. They are non-capturing groups. Again, see "man perlre" about that.

pan64 05-12-2020 10:33 AM

Without any other options, -p processes the input line by line. No line can contain anything after the \n. You have to change the record separator:
Code:

perl -0pe 's/(::\N*\n)\n/$1+\n/g'

shruggy 05-12-2020 11:11 AM

@Turbocapitalist. Hats off!

@others. No, it's not so easy, see Turbocapitalist's solution. The OP didn't make it clear, but lines that don't include :: are considered part of the previous paragraph.

pan64 05-12-2020 12:15 PM

Yes, now I understand.
Code:

perl -pe 'BEGIN{$/="::"} {s/(\n\N+\n)/+$1/g}'

Turbocapitalist 05-12-2020 12:31 PM

That looks more efficient. But that use for \N sure is buried deeply in the manual page.

The special variables also have more mnemonic names, too:

Code:

perl -pe 'BEGIN{$RS="::"} {s/(\n\N+\n)/+$1/g}'
Or the English module can allow full names for the variables:

Code:

perl -MEnglish -pe 'BEGIN{$INPUT_RECORD_SEPARATOR="::"} {s/(\n\N+\n)/+$1/g}'
A one-liner might lose its simplicity that way but a full script would be more readable with that module.


All times are GMT -5. The time now is 09:06 PM.