LinuxQuestions.org - [SOLVED] newline after each string match

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - newline after each string match (https://www.linuxquestions.org/questions/programming-9/newline-after-each-string-match-4175734342/)

bishop2001

02-28-2024 10:56 AM

newline after each string match

hi i have a file that looks like:

something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"

each time i encounter the string "something" i want everything until the next "something" to be on a new line like

something,weoefr0o[0980123-0] bbnu1 "219E"
something,weo23ef0o[0980123-0] bbnu2 "21e9E"
something,weoeeef0o[0980123-0] bbnu2 "219E"

suggestions please? Thanks again

TB0ne

02-28-2024 11:01 AM

Quote:

Originally Posted by bishop2001 (Post 6486431)

The suggestion I'd have would be to look at several of your other similar threads dealing with grep/sed/awk, and think about how they apply here. And I'd also suggest (as has been asked before) that you post what YOU have already done/tried, rather than just saying what you want without showing any effort.

pan64

02-28-2024 11:22 AM

Yes, would be nice to try to solve it by yourself. you will feel the sense of success.
Otherwise you can solve it using bash, sed, awk, perl, python or whatever language you prefer (including grep).

bishop2001

02-28-2024 12:20 PM

this looks like it it works unless someone has a better way ?

grep -o -P '(?<=something).*(?=bbnu2)'

trying to add a newline after each time there is a match as the file is huge, looks like

something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E" etc...

Thanks.

TB0ne

02-28-2024 01:23 PM

Quote:

Originally Posted by bishop2001 (Post 6486448)

this looks like it it works unless someone has a better way ?

Code:

grep -o -P '(?<=something).*(?=bbnu2)'

Does it WORK?? If so, then you have solved your problem.

Quote:

trying to add a newline after each time there is a match as the file is huge, looks like

Code:

something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E" etc...

Yes, you posted this earlier...you don't give us any context for this, such as how often you have to do this task, what else you've scripted around that grep, or where you're getting this data from. Personally, I'd look at where the data source is to see if there's a way to export it in a different fashion.

astrogeek

02-28-2024 01:52 PM

Quote:

Originally Posted by bishop2001 (Post 6486448)

But this differs significantly from your original description of the problem:

Quote:

Originally Posted by bishop2001 (Post 6486431)

something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E"

each time i encounter the string "something" i want everything until the next "something" to be on a new line

Given the following input file your original rules and the attempt shown above would produce very different results:

Code:

$ cat infile

something,weoefr0o[0980123-0] bbnu1 "1xxxx"something,weo23ef0o[0980123-0] bbnu2 "2xxxx"something,weoeeef0o[0980123-0] bbnu2 "3xxxx"

There is nothing that matches in this line, should I print it or not?

something borrowedsomething blue something old andsomething new

What would you expect the output of this file to look like?

It is always important to clearly define the problem before you begin looking for solutions, otherwise you can never know when you have found something that (always) works.

teckk

02-28-2024 04:19 PM

Many ways to do that.

Code:

a="something,weoefr0o[0980123-0] bbnu1 "219E"something,weo23ef0o[0980123-0] bbnu2 "21e9E"something,weoeeef0o[0980123-0] bbnu2 "219E""



grep -oP 'something.+?(?=something|$)' <<< "$a"



something,weoefr0o[0980123-0] bbnu1 219E

something,weo23ef0o[0980123-0] bbnu2 21e9E

something,weoeeef0o[0980123-0] bbnu2 219E

syg00

02-28-2024 04:41 PM

grep wasn't the first thing to pop to mind - sed to just plug in the newline seems simplest; no perlre needed.

TB0ne

02-28-2024 04:46 PM

Quote:

Originally Posted by syg00 (Post 6486497)

grep wasn't the first thing to pop to mind - sed to just plug in the newline seems simplest; no perlre needed.

Indeed; mentioned sed to the OP initially, and the OP has been directed to it in the past, for quite some time. Like this, from seven years ago:
https://www.linuxquestions.org/quest...224/page2.html

...but the OP doesn't seem to want to apply anything they've been told previously, and seems to want others to do things for them.

sundialsvcs

02-28-2024 05:47 PM

Incidentally, when solving problems like this I usually turn to the Perl programming language, which was originally designed by a geek named Larry Wall when he decided that "awk" wasn't good enough. It has excellent support for "regular expressions" and is very well-suited to "text file processing" tasks such as the present example.

Perl is a "very full-featured – if a bit quirky – programming language" with an exceptional "contributed software library" called CPAN. I am now convinced that "this library goes on forever." :) No matter what it is that you are now doing, you'll probably discover that it has already been done – and, done very well indeed. Perl has been referred to as "the Swiss Army Knife® of pragmatic computer programming," and I think that this assessment is quite fair. "This is a serious power tool."

And – if the first line of your script is something like #!/usr/bin/perl, it can now be executed as "a command-line command." Thanks to the magic of what is called "#! – shebang", the shell will silently invoke the Perl interpreter to carry out your programming, and the user will be none the wiser.

The definitive – and definitely "quirky" – website for the Perl community is undoubtedly perlmonks.org. ("Be prepared to encounter 'personalities' ... but they know their stuff.")

pan64

02-29-2024 01:38 AM

grep is not really the ideal tool for this, grep is used to search for something, not for creating formatted output. But anyway, if you wish:

Code:

grep -Po 'something([^"]*"){2}'

MadeInGermany

03-02-2024 04:44 AM

The following inserts newlines before every something that is not at the beginnning of the line.

Code:

sed -r 's/(.)(something)/\1\n\2/g' filename

A character (.) must be before the something, otherwise it is at the beginning of the line.
Both the character and the something must be captured in ( ) and given back as \1 and \2 (because all what matches is substituted).

syg00

03-02-2024 05:02 AM

As usual, this is not the only (sed) solution that achieves the desired result.

teckk

03-02-2024 09:09 AM

Quote:

Perl is a "very full-featured – if a bit quirky – programming language" with an exceptional "contributed software library" called CPAN.

Perl regex is nifty. And every linux distro has perl. I'm starting to use it more.

Example, colorize just the greek:

Code:

txt="1|1|1| First Line of text ταύτην και ότι here

1|1|2| Second Line of text ταύτην και ότι here

1|1|3| Third Line of text ταύτην και ότι here

1|1|4| Fourth Line of text ταύτην και ότι here"



perl -pe 's/([Α-Ωα-ωάΆϐέΈήΉίϊΐΪΊόΌϋύΰΫΎώΏοη].([^a-zA-Z]+))/<font color='#4DD2FF'>\1<\/font>/g' <<< "$txt"



1|1|1| First Line of text <font color=#4DD2FF>ταύτην και ότι </font>here

1|1|2| Second Line of text <font color=#4DD2FF>ταύτην και ότι </font>here

1|1|3| Third Line of text <font color=#4DD2FF>ταύτην και ότι </font>here

1|1|4| Fourth Line of text <font color=#4DD2FF>ταύτην και ότι </font>here

All times are GMT -5. The time now is 04:18 AM.