LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   bash script- text file manip- delete everything before/after string (https://www.linuxquestions.org/questions/programming-9/bash-script-text-file-manip-delete-everything-before-after-string-603989/)

justin99 12-02-2007 04:49 PM

bash script- text file manip- delete everything before/after string
 
I need a bash script to eliminate junk from the beginning and end of a text file.

can i search a file for 'good stuff begins', deleting each line until that string, then again, deleting each line after 'good stuff ends'

or- perhaps it is possible to copy everything from 'good stuff begins' to 'good stuff ends' to a new file?

tia

gilead 12-02-2007 05:00 PM

You can use sed to do that. There's a tutorial at http://www.grymoire.com/Unix/Sed.html which you might find useful. The following example removes comments (lines that start with #) between the 2 lines containing 'start' and 'stop':
Code:

sed '/start/,/stop/ s/#.*//'

justin99 12-02-2007 07:48 PM

so close
 
thanks for the help- i'm very close now, but still no cigar.

sed '1,/<START>/ d' temp.html > temp1.html

this works great, removing everything from line 1 until <START>

but i'm stuck on removing everything after <END>

justin99 12-02-2007 07:57 PM

got it
 
turns out the ! is what i needed

this deletes everything but the selected region

sed '/<START>/,/<END>/ !d' temp.html > temp1.html



why did i waste my time googling- shoulda come here first (- ;


thanks again!

gilead 12-02-2007 09:21 PM

I'm glad you got it working. Thanks for posting the code you ended up using too... :)

skkuizu 05-15-2009 02:41 PM

similar issu with sed and the bash
 
My problem however that I want to remove/delete all character before the START variable and also all character following the END variable.

However START and END are not on the beginning of a line they are in the middle and furthermore in the same line similar to that example:

t6gd68g d9d8j5%9j30j 0jf087*(&&^*2hd920STARTid8 =e72920 2d9nf9END93nf300j90

How to I get the output between START and END, resulting in id8 =e72920 2d9nf9 for this example. I can't find anything on the net so far.

Happy for every idea.

Cheers.

forrestt 05-15-2009 02:51 PM

If you just have one line (or a variable you want on every line) then:

Code:

sed -e 's/.*START//' -e 's/END.*//'
HTH

Forrest

skkuizu 05-15-2009 03:09 PM

Thanks
 
Quote:

Originally Posted by forrestt (Post 3542156)
If you just have one line (or a variable you want on every line) then:

Code:

sed -e 's/.*START//' -e 's/END.*//'
HTH

Forrest


Forrest, that's great it works! It's so easy that I can't believe it, I should have come straight to this forum, instead I was wasting my time with google for 24h :( Many thanks again!!!

ghostdog74 05-15-2009 07:40 PM

Quote:

Originally Posted by skkuizu (Post 3542167)
instead I was wasting my time with google for 24h :( Many thanks again!!!

instead of searching for the exact solution, what you really need to do is to search for a tutorial on shell scripting and read up on the basics.

Sergiof4 11-20-2014 03:16 AM

Hello,

hope it is OK to re-open this (very useful, imho) thread.

I need to clean some html pages and the "region" i have to keep is between the <article></article> tag.

I tried using

Code:

sed '/<article>/,/</article>/ !d' a.htm > b.htm
but it didn't work.

Thank you.


All times are GMT -5. The time now is 04:20 AM.