Removing Text in a single line starting with one pattern ending on another
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Removing Text in a single line starting with one pattern ending on another
I have run a CGI through wget for a static HTML page. The drag is that I want to remove all href's out of it. So I want to pass it through something that can search for a beginning pattern through an ending pattern in any single line and delete only the text out of that line between and including the two patterns. When I have done it with sed I end up deleting everything from the First of the first patterns through the last of the last patterns (so practically the whole file.)
Sure, but remember please this is my first attempt at hacking a file in Unix using Sed so be gentle!
#/bin/sh
# Get the page with wget, saving it as a temp file
/usr/bin/wget --http-user Nagiosadmin -O /tmp/nagios_avail.cgi.tmp.$$ -q "http://nagios.domainus.com/nagios/cgi-bin/avail.cgi?show_log_entries=&host=all&timeperiod=last7days&assumeinitialstates=yes&assumestateretenti on=yes&initialassumedstate=0&"
#Taking out the Unwanted Parts
cat /tmp/nagios_avail.cgi.tmp.$$ | sed -e "s/\/nagios\/stylesheets/\/stylesheets/g" | sed -e "/marquee/d" | sed -e "11,22d" | sed -e "14,16d" | sed -e "17,87d" | sed -e "s/ Breakdowns//g" | sed -e "s/<a[^>]*>|<\/a>//gi" > /var/www/html/avail.html
Here's a concrete example, using sed, which removes the <a href> and </a>. As David mentioned, regexp notation varies from language to language so if you want to use something other than sed you will probably need to modify the regexp.
$ echo '...<a href="http://example.org/">Test</a>...' | sed 's:<a[^>]*>\|</a>::gi'
...Test...
[^>]* = any character except a >, zero or more times. This stops it matching the whole line : if you used .* instead it would match too much, causing your original problem.
All that "[^>]*" means is match any character up until the next ">" this is then followed by a ">" since you actually want rid of it too. The only other think you didn't mention is the "i" which performs a case insensitive search.
Just as another side note you can actually use "wget -qO - http://blah" and this will output "-O" to "-" which stands for stdout. This will save you wrting to a temporary file.
Thanks, Cleaning it up after it was functional was my next step. I tried it once but for some reason when I combined them all the line numbers I wanted deleted were different and I ended up deleting some stuff I wanted and not deleting other stuff I didn't need. So I'll nail it slowly and see how it goes.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.