LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 09-12-2005, 03:23 PM   #1
activeco
LQ Newbie
 
Registered: Sep 2005
Location: Milky Way
Posts: 14

Rep: Reputation: 0
Deleting text between two different patterns


I want to put the command in a script but although seemingly very simple task I couldn't find the way to do it.
So, if I have some text in a file, on one or accross more lines, say: "asadgas<jk mjk bb><gjgksdlsl" ;
and I want to delete everything between "<jk" and ">" (in this case " mjk bb", usually of different length), what would be the best way to do it from bash?
I prefer sed as the processing files are pretty large and I would like to remove only the first matching instance and to exit immediately, but of course any working solution is welcome.
I could easily do it in rexx or php, but I would like to stay in bash.
Thanks in advance for all replies.

Last edited by activeco; 09-12-2005 at 03:26 PM.
 
Old 09-12-2005, 05:36 PM   #2
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,772

Rep: Reputation: 309Reputation: 309Reputation: 309Reputation: 309
According to the Advanced Bash-Scripting Guide, chapter 12, section 4, bash has limited text editing capability of its own. To expand that capability, you need to invoke sed, awk, or some other scripting language from the bash script.
 
Old 09-12-2005, 05:59 PM   #3
activeco
LQ Newbie
 
Registered: Sep 2005
Location: Milky Way
Posts: 14

Original Poster
Rep: Reputation: 0
Thanks blgrlgdriver.
That is actually what I meant; how to do it with e.g. sed, awk or anything else built-in?
 
Old 09-12-2005, 06:54 PM   #4
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Quote:
Originally posted by activeco
Thanks blgrlgdriver.
That is actually what I meant; how to do it with e.g. sed, awk or anything else built-in?
Something like this? :)

Code:
#!/bin/awk -f
# strip between BeginTag and EndTag
# usage: awk -v BeginTag="xxx" -v EndTag="yyy" -f strip.awk input > output


BEGIN{
        if (!BeginTag) {
           print "usage: awk -v BeginTag="xxx" -v EndTag="yyy" -f strip.awk input"
           exit;
        }
}

{
        if (Split) {#
                if ($0 ~ EndTag) {
                        $0=substr($0,index($0,EndTag)+length(EndTag))
                        Split=0
                }
                else $0=""
        }

        if ($0 ~ BeginTag){
                Line=substr($0,1,index($0,BeginTag)-1)
                if ($0 ~ EndTag) Line=Line substr($0,index($0,EndTag)+length(EndTag))"\n"
                else Split=1
                if (Line=="" || Line=="\n") Line="!@!@empty"
        }

        if (Line) {
                if (Line != "!@!@empty") printf Line
                Line=""
        }
        else print $0

}

Code:
$ echo "asadgas<jk mjk bb><gjgksdlsl"|awk -v BeginTag="<jk" -v EndTag=">" -f strip.awk
asadgas<gjgksdlsl

That what you want?


Cheers,
Tink
 
Old 09-13-2005, 10:32 AM   #5
activeco
LQ Newbie
 
Registered: Sep 2005
Location: Milky Way
Posts: 14

Original Poster
Rep: Reputation: 0
Thumbs up

Yes Tinkster, I'll use it although I expected one liner
Thank you very much for your time.
 
Old 09-13-2005, 01:41 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,978
Blog Entries: 11

Rep: Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879Reputation: 879
Sorry, not all problems can be solved with a one-liner ;}

This one is highly re-usable, though!


Cheers,
Tink
 
Old 09-13-2005, 04:51 PM   #7
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
It is having the pattern across multiple lines that makes things more complicated.

If the pattern, or more than one pattern were contained on a single line, this one-liner would do it:
Code:
sed 's/<jk[^>]>/<jk >/g' originalfile >newfile
When crossing lines, when using sed, you need to add lines to the pattern space until the end pattern is reached:

Code:
# remblock.sed
# remove <jk > block
s/<jk[^>]>/<jk >/g     # handles pattern(s) on a single line
t:
/<jk/,/>/{                    # handle multilines between '<jk' and '>'
                />/! {          # not at the end marker '>'
                   /$/! {       #  This isn't the last line of the file.
                             N
                             bt
                          }        # add the next line to the pattern space and branch back to "t:"
                   }
                    s/<jk[^]]*>/<jk >/g
This script isn't too long. It may need tweaking in the case where the first end pattern is on a line, with the next start pattern on the same line. It does handle the cases where the pattern is on the same line, where more than one pattern is on the same line, where the pattern stretches across multiple lines.

You would call this program like:
sed -f remblock.sed originalfile >outputfile

If it is thoroughly tested and trusted, you could use inplace editing:
sed -i -f remblock.sed originalfile
 
Old 09-13-2005, 06:05 PM   #8
activeco
LQ Newbie
 
Registered: Sep 2005
Location: Milky Way
Posts: 14

Original Poster
Rep: Reputation: 0
Quote:
Originally posted by jschiwal

Code:
sed 's/<jk[^>]>/<jk >/g' originalfile >newfile
I like the thinking in this solution for the same line. I already played with sed's substitute option, but didn't think of simple and obvious way of providing the final replacement as the substitution string. Somewhere in the back, I always had the feeling that this is an unknown string while it is indeed - not.
The only thing I probably don't need here is the /g option as I need only one/first pattern(s) to be matched.

Well, thanks again guys.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
grep for multiple patterns???? lucastic Linux - Software 4 08-06-2010 06:07 PM
Deleting empty line at end of text file in BASH human2.0 Linux - General 8 04-01-2009 02:44 AM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 05:18 PM
inserting/deleting characters into a text file ananthbv Programming 7 07-13-2004 11:40 PM
Searching patterns from file MichaelVaughn Programming 1 04-06-2004 11:18 AM


All times are GMT -5. The time now is 03:38 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration