[SOLVED] problems using regular expression with sed
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
ok so im using osx 10.6.8 but i have something with linux on it somewhere I can try this on if that is the problem
I am attempting to take certain parts out of a html file I have figured out how to write a regular expression to specify this data I want to delete using the search funciton in the text editor TextWrangler:
(?<=<div id="right_col">)[\s\S]*(?=</body>)
This works in text wrangler but when I try to use it with sed it gives me errors like this one:
I know to use -e to avoid problems with the unix version OSX is based on. The sed is a bit different because its based on an older unix variant but i did that and its still giving me an error. I assume there is something basic I am missing about formatting your regular expression for using with commandline and what characters you can use. I bet it has something to do with the backslashes or the round brackets. normally I would just keep searching till I find the answer but I am bored of this problem. I need to think about something else for awhile. maybe in the meantime someone else can chime in and help me out.
Can you post the exact sed command you've issued and post a representative small chunk of strings you wish to process and the desired outcome? Also please post the output of:
Distribution: Linux From Scratch, Slackware64, Partedmagic
Posts: 3,057
Rep:
as above post the actual command you are using, also what shell, ( bash,ash etc) but at first glance you may have to escape the brackets '()' as they have special meaning in some ( all ? ) shells.
I have figured out how to write a regular expression to specify this data I want to delete using the search funciton in the text editor TextWrangler:
(?<=<div id="right_col">)[\s\S]*(?=</body>)
That regular expression uses Positive lookbehind assertion (?<=...) and Positive lookahead assertion (?=...), sed doesn't support these. Also, sed only matches one line at a time, unless you jump through some hoops.
Quote:
This works in text wrangler but when I try to use it with sed it gives me errors like this one:
ok thanks guys i'm new to regular expressions as well as sed. It looks like sed may not be the right tool for the job im deleteing hundreds of lines of code from the top and bottom of a webpage. stripping off the header and footer and all adds. all of this data lies in the top and bottom parts of the page and can be deleted in a big chunk im doing this so I can archive the online copy of the articles in my old wired magizenes. I will then scan what little of the magazine isn't on the site and the throw away the magizenes. Text wranger is working well so far. It would be nice to automate more of the task because it still takes time and I have a lot to go through but maybe I will try using applescript or perhaps attempt to learn pearl.
here is the command I tried for the bottom part of the page:
sed -i -e '(?<=<div id="right_col">)[\s\S]*(?=</body>)' The_Cold_Hard_Data_of_Soda_Ice.html > soda.html
as noted before positive lookbehind and lookahead assertion is not supported so this expresison would have to be pretty much redone from the ground up.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.