how to look for the shortest match using regex, bascially the opposite of .*
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
here is an example of what i'm trying to do.
i'm trying to delete everything between and including <tag1> and </tag1>. but anything that's outside of this should not be deleted.
i'm doing this with a sed script, and the regex is not working.
<html><body><tag1>This is inside tag1. This should be deleted.</tag1>This is the first statement outside of tag1. This should NOT be deleted.<tag1>This is once again inside tag1. This should be deleted as well.</tag1> This is the second statement outside tag1. This should NOT be deleted.</body></html>
i've tried the following:
in this one the problem is that it deletes the first line outside <tag1> as well.
$cat test1 | sed 's/<tag1>.*<\/tag1>//'
<html><body>This is the second statement outside tag1. This should not be deleted.</body></html>
in this one the problem is that it does not delete anything:
$cat test1 | sed 's/<tag1>.+?<\/tag1>//'
<html><body><tag1>This is inside tag1. This should be deleted.</tag1>This is the first statement outside tag1. This should not be deleted.<tag1>This is once again inside tag1. This should be deleted as well.</tag1>This is the second statement outside tag1. This should not be deleted.</body></html>
i've tried several variants of the regex above... but am still at a loss on how to do this... any guidance will be helpful. thanks!
What about using "s/(^.*<tag1>).*(</tag1>.*$)/\1\2/"? That will save everything up to and including <tag1> from the start of the line, and then save everything after and including </tag1>, to the end of the line, chopping the middle. An inelegant solution, I know, but something that may work until something better comes along.