Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
09-12-2005, 03:23 PM
|
#1
|
LQ Newbie
Registered: Sep 2005
Location: Milky Way
Posts: 14
Rep:
|
Deleting text between two different patterns
I want to put the command in a script but although seemingly very simple task I couldn't find the way to do it.
So, if I have some text in a file, on one or accross more lines, say: "asadgas<jk mjk bb><gjgksdlsl" ;
and I want to delete everything between "<jk" and ">" (in this case " mjk bb", usually of different length), what would be the best way to do it from bash?
I prefer sed as the processing files are pretty large and I would like to remove only the first matching instance and to exit immediately, but of course any working solution is welcome.
I could easily do it in rexx or php, but I would like to stay in bash.
Thanks in advance for all replies.
Last edited by activeco; 09-12-2005 at 03:26 PM.
|
|
|
09-12-2005, 05:36 PM
|
#2
|
LQ Addict
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian stable
Posts: 5,908
|
According to the Advanced Bash-Scripting Guide, chapter 12, section 4, bash has limited text editing capability of its own. To expand that capability, you need to invoke sed, awk, or some other scripting language from the bash script.
|
|
|
09-12-2005, 05:59 PM
|
#3
|
LQ Newbie
Registered: Sep 2005
Location: Milky Way
Posts: 14
Original Poster
Rep:
|
Thanks blgrlgdriver.
That is actually what I meant; how to do it with e.g. sed, awk or anything else built-in?
|
|
|
09-12-2005, 06:54 PM
|
#4
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally posted by activeco
Thanks blgrlgdriver.
That is actually what I meant; how to do it with e.g. sed, awk or anything else built-in?
|
Something like this? :)
Code:
#!/bin/awk -f
# strip between BeginTag and EndTag
# usage: awk -v BeginTag="xxx" -v EndTag="yyy" -f strip.awk input > output
BEGIN{
if (!BeginTag) {
print "usage: awk -v BeginTag="xxx" -v EndTag="yyy" -f strip.awk input"
exit;
}
}
{
if (Split) {#
if ($0 ~ EndTag) {
$0=substr($0,index($0,EndTag)+length(EndTag))
Split=0
}
else $0=""
}
if ($0 ~ BeginTag){
Line=substr($0,1,index($0,BeginTag)-1)
if ($0 ~ EndTag) Line=Line substr($0,index($0,EndTag)+length(EndTag))"\n"
else Split=1
if (Line=="" || Line=="\n") Line="!@!@empty"
}
if (Line) {
if (Line != "!@!@empty") printf Line
Line=""
}
else print $0
}
Code:
$ echo "asadgas<jk mjk bb><gjgksdlsl"|awk -v BeginTag="<jk" -v EndTag=">" -f strip.awk
asadgas<gjgksdlsl
That what you want?
Cheers,
Tink
|
|
|
09-13-2005, 10:32 AM
|
#5
|
LQ Newbie
Registered: Sep 2005
Location: Milky Way
Posts: 14
Original Poster
Rep:
|
Yes Tinkster, I'll use it although I expected one liner
Thank you very much for your time.
|
|
|
09-13-2005, 01:41 PM
|
#6
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Sorry, not all problems can be solved with a one-liner ;}
This one is highly re-usable, though!
Cheers,
Tink
|
|
|
09-13-2005, 04:51 PM
|
#7
|
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733
|
It is having the pattern across multiple lines that makes things more complicated.
If the pattern, or more than one pattern were contained on a single line, this one-liner would do it:
Code:
sed 's/<jk[^>]>/<jk >/g' originalfile >newfile
When crossing lines, when using sed, you need to add lines to the pattern space until the end pattern is reached:
Code:
# remblock.sed
# remove <jk > block
s/<jk[^>]>/<jk >/g # handles pattern(s) on a single line
t:
/<jk/,/>/{ # handle multilines between '<jk' and '>'
/>/! { # not at the end marker '>'
/$/! { # This isn't the last line of the file.
N
bt
} # add the next line to the pattern space and branch back to "t:"
}
s/<jk[^]]*>/<jk >/g
This script isn't too long. It may need tweaking in the case where the first end pattern is on a line, with the next start pattern on the same line. It does handle the cases where the pattern is on the same line, where more than one pattern is on the same line, where the pattern stretches across multiple lines.
You would call this program like:
sed -f remblock.sed originalfile >outputfile
If it is thoroughly tested and trusted, you could use inplace editing:
sed -i -f remblock.sed originalfile
|
|
|
09-13-2005, 06:05 PM
|
#8
|
LQ Newbie
Registered: Sep 2005
Location: Milky Way
Posts: 14
Original Poster
Rep:
|
Quote:
Originally posted by jschiwal
Code:
sed 's/<jk[^>]>/<jk >/g' originalfile >newfile
|
I like the thinking in this solution for the same line. I already played with sed's substitute option, but didn't think of simple and obvious way of providing the final replacement as the substitution string. Somewhere in the back, I always had the feeling that this is an unknown string while it is indeed - not.
The only thing I probably don't need here is the /g option as I need only one/first pattern(s) to be matched.
Well, thanks again guys.
|
|
|
All times are GMT -5. The time now is 09:31 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|