LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   BASH: string manipulation (https://www.linuxquestions.org/questions/programming-9/bash-string-manipulation-904110/)

rhuhawk 09-20-2011 07:46 PM

BASH: string manipulation
 
Hi, I am trying to figure out how to isolate certain parts of string.

In a given file (test.txt) I have some lines:

blah [Need this phrase] bblah
blah ya [Need
this phrase as well]

Is there a way for me to extract only what is in the square brackets? Even if the text goes to the next line?

The extracted phrase should be saved as a variable. Is there any way to do this with the sed command?
Thank you!

corp769 09-20-2011 08:02 PM

Hello,

You would have to use sed for this. Look at the following:
Code:

sed -e 's/.*\[\([^]]*\)\].*/\1/g'
This would be the command line equivelant to perform the operation you are looking for. As an example (I'm not within linux, so this might be off...):
Code:

echo "blah [Need this phrase] bblah" | sed -e 's/.*\[\([^]]*\)\].*/\1/g'
Which should return for you "Need this phrase".

Cheers,

Josh

rhuhawk 09-20-2011 08:12 PM

Great!

Thanks for the quick reply!!!

When I run this:

sed -e 's/.*\[\([^]]*\)\].*/\1/g' test.txt

and test.txt is:
blah [Need this phrase] bblah
blah ya [Need
this phrase as well]


It outputs:
Need this phrase
blah ya [Need
this phrase as well]

Is there any way I can get it to keep reading lines? When the desired string begins on one line and continues on another line how could I get it keep reading through to the next line until it finds the end ].

Thanks again in advance

grail 09-20-2011 10:57 PM

How about:
Code:

sed -r ':a /]/! N;ta;s/.*\[(.*)\].*/\1/' file

kurumi 09-21-2011 05:37 AM

Code:

$ ruby -0777 -ne '$_.split("]").each{|x| puts "#{x.split("[")[-1]}" if x[/\[/]  }' file
Need this phrase
Need
this phrase as well


grail 09-21-2011 06:30 AM

So still learning from the master <bow> to kurumi :)

Now that i have seen what 0777 can do:
Code:

ruby -0777 -ne 'puts $_.scan(/\[([^\]]+)/)' file

Kenhelm 09-21-2011 08:55 AM

Using GNU awk
Code:

echo '
blah
blah [Need this phrase] bblah
[
Need
this
phrase
]
[Need this phrase] blah ya [Need
this phrase as well] blah [Need this phrase] blah
blah' | awk '/./' RS='[^]]*[[]\n?|\n?][^[]*'

Need this phrase
Need
this
phrase
Need this phrase
Need
this phrase as well
Need this phrase

Or, to have each phrase on a single line
Code:

awk '/./{gsub(/\n/," ");print}' RS='[^]]*[[]\n?|\n?][^[]*'

Need this phrase
Need this phrase
Need this phrase
Need this phrase as well
Need this phrase


crts 09-21-2011 09:17 AM

small correction
 
Quote:

Originally Posted by grail (Post 4477733)
How about:
Code:

sed -r ':a /]/! N;ta;s/.*\[(.*)\].*/\1/' file

The 't' command will only jump if an 's' command has made a substitution since the last line was read. So a conditional 't' jump directly after reading a new line has no effect.

This works as long as there are no multiple patterns on the same line to keep:
Code:

sed -r ':a /]/! N;s/.*\[(.*)\].*/\1/;Ta' file

grail 09-21-2011 09:37 AM

Cheers crts ... still getting my sedfu together :) although I noticed with Kenhelm's example this doesn't get all the necessary ones :(

crts 09-21-2011 11:05 AM

Quote:

Originally Posted by grail (Post 4478185)
Cheers crts ... still getting my sedfu together :) although I noticed with Kenhelm's example this doesn't get all the necessary ones :(

Yes, as I stated above
Quote:

This works as long as there are no multiple patterns on the same line to keep:
the solution has some restrictions. To also accommodate for Kenhelm's sample data we could use:
Code:

sed -nr ':a /\[[^]]*$/ {N;ba}; s/[^[]*\[([^]]*)\][^[]*/\1/pg; ' file
As you can see, with the above solution we have to use an unconditional jump.


All times are GMT -5. The time now is 04:30 PM.