Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to put the command in a script but although seemingly very simple task I couldn't find the way to do it.
So, if I have some text in a file, on one or accross more lines, say: "asadgas<jk mjk bb><gjgksdlsl" ;
and I want to delete everything between "<jk" and ">" (in this case " mjk bb", usually of different length), what would be the best way to do it from bash?
I prefer sed as the processing files are pretty large and I would like to remove only the first matching instance and to exit immediately, but of course any working solution is welcome.
I could easily do it in rexx or php, but I would like to stay in bash.
Thanks in advance for all replies.
According to the Advanced Bash-Scripting Guide, chapter 12, section 4, bash has limited text editing capability of its own. To expand that capability, you need to invoke sed, awk, or some other scripting language from the bash script.
It is having the pattern across multiple lines that makes things more complicated.
If the pattern, or more than one pattern were contained on a single line, this one-liner would do it:
Code:
sed 's/<jk[^>]>/<jk >/g' originalfile >newfile
When crossing lines, when using sed, you need to add lines to the pattern space until the end pattern is reached:
Code:
# remblock.sed
# remove <jk > block
s/<jk[^>]>/<jk >/g # handles pattern(s) on a single line
t:
/<jk/,/>/{ # handle multilines between '<jk' and '>'
/>/! { # not at the end marker '>'
/$/! { # This isn't the last line of the file.
N
bt
} # add the next line to the pattern space and branch back to "t:"
}
s/<jk[^]]*>/<jk >/g
This script isn't too long. It may need tweaking in the case where the first end pattern is on a line, with the next start pattern on the same line. It does handle the cases where the pattern is on the same line, where more than one pattern is on the same line, where the pattern stretches across multiple lines.
You would call this program like:
sed -f remblock.sed originalfile >outputfile
If it is thoroughly tested and trusted, you could use inplace editing:
sed -i -f remblock.sed originalfile
I like the thinking in this solution for the same line. I already played with sed's substitute option, but didn't think of simple and obvious way of providing the final replacement as the substitution string. Somewhere in the back, I always had the feeling that this is an unknown string while it is indeed - not.
The only thing I probably don't need here is the /g option as I need only one/first pattern(s) to be matched.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.