LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-21-2015, 12:12 PM   #1
geekdude
LQ Newbie
 
Registered: Apr 2015
Posts: 2

Rep: Reputation: Disabled
problems using regular expression with sed


ok so im using osx 10.6.8 but i have something with linux on it somewhere I can try this on if that is the problem

I am attempting to take certain parts out of a html file I have figured out how to write a regular expression to specify this data I want to delete using the search funciton in the text editor TextWrangler:

(?<=<div id="right_col">)[\s\S]*(?=</body>)

This works in text wrangler but when I try to use it with sed it gives me errors like this one:

sed: 1: "(?<=<div id="right_col" ...": invalid command code (

I know to use -e to avoid problems with the unix version OSX is based on. The sed is a bit different because its based on an older unix variant but i did that and its still giving me an error. I assume there is something basic I am missing about formatting your regular expression for using with commandline and what characters you can use. I bet it has something to do with the backslashes or the round brackets. normally I would just keep searching till I find the answer but I am bored of this problem. I need to think about something else for awhile. maybe in the meantime someone else can chime in and help me out.
 
Old 04-21-2015, 01:06 PM   #2
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 840

Rep: Reputation: 380Reputation: 380Reputation: 380Reputation: 380
can you post the exact sed command you run?
 
Old 04-21-2015, 01:08 PM   #3
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,110
Blog Entries: 10

Rep: Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525Reputation: 1525
Can you post the exact sed command you've issued and post a representative small chunk of strings you wish to process and the desired outcome? Also please post the output of:
Code:
sed --version
 
Old 04-21-2015, 06:41 PM   #4
Keith Hedger
Senior Member
 
Registered: Jun 2010
Location: Wiltshire, UK
Distribution: Linux From Scratch, Slackware64, Partedmagic
Posts: 2,254

Rep: Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559
as above post the actual command you are using, also what shell, ( bash,ash etc) but at first glance you may have to escape the brackets '()' as they have special meaning in some ( all ? ) shells.
 
Old 04-21-2015, 06:56 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,013

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Quote:
Originally Posted by geekdude View Post
I have figured out how to write a regular expression to specify this data I want to delete using the search funciton in the text editor TextWrangler:

(?<=<div id="right_col">)[\s\S]*(?=</body>)
That regular expression uses Positive lookbehind assertion (?<=...) and Positive lookahead assertion (?=...), sed doesn't support these. Also, sed only matches one line at a time, unless you jump through some hoops.

Quote:
This works in text wrangler but when I try to use it with sed it gives me errors like this one:

sed: 1: "(?<=<div id="right_col" ...": invalid command code (
For sed, you need to use the regular expression as part of a command, eg:

Code:
sed 's/some-regular-expression//'
which means replace whatever some-regular-expression matches with nothing.
 
Old 04-22-2015, 02:53 PM   #6
geekdude
LQ Newbie
 
Registered: Apr 2015
Posts: 2

Original Poster
Rep: Reputation: Disabled
ok thanks guys i'm new to regular expressions as well as sed. It looks like sed may not be the right tool for the job im deleteing hundreds of lines of code from the top and bottom of a webpage. stripping off the header and footer and all adds. all of this data lies in the top and bottom parts of the page and can be deleted in a big chunk im doing this so I can archive the online copy of the articles in my old wired magizenes. I will then scan what little of the magazine isn't on the site and the throw away the magizenes. Text wranger is working well so far. It would be nice to automate more of the task because it still takes time and I have a lot to go through but maybe I will try using applescript or perhaps attempt to learn pearl.

here is the command I tried for the bottom part of the page:
sed -i -e '(?<=<div id="right_col">)[\s\S]*(?=</body>)' The_Cold_Hard_Data_of_Soda_Ice.html > soda.html

as noted before positive lookbehind and lookahead assertion is not supported so this expresison would have to be pretty much redone from the ground up.

the page looks like this:

<p>

</p>



</div>


</div>

<div id="right_col">
<div id="search">

<form action="http://archive.wired.com/search" id="nav_search" name="search" onsubmit="return validateSearch(this)">
<div class="title">
Search Wired
</div>
<input type="text" class="input_text" name="query">
<select class="search_filter" id="art_filter" name="siteAlias">
<option value="noblog" class="opt" id="art_top_stories" name="noblog" default="true">Top Stories
</option>

......536 lines later:


<embed type="application/x-shockwave-flash" src="http://s.moatads.com/swf/MessageSenderV2.swf" quality="high" flashvars="r=MoatSuperV5.yh.zb&amp;s=MoatSuperV5.yh.zc&amp;e=MoatSuperV5.yh.zd&amp;td=afs.moatads.co m" bgcolor="#ffffff" width="1" height="1" id="moatMessageSenderEmbed" align="middle" allowscriptaccess="always" allowfullscreen="false"></object></div><iframe id="stSegmentFrame" name="stSegmentFrame" src="./Star Power_ Why Fusion Proves Elusive_files/getSegment.html" frameborder="0" scrolling="no" width="0px" height="0px" style="display:none;"></iframe></body></html>
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Extract a substring using regular expression with SED PenguinJr Programming 9 05-11-2011 01:47 PM
Sed regular expression question kmkocot Programming 6 06-30-2010 11:29 AM
sed - regular expression Vilmerok Programming 5 02-26-2009 09:44 AM
sed regular expression Ammad Linux - General 7 10-29-2008 06:52 PM
sed regular expression help needed Dew Linux - Newbie 1 03-30-2005 03:59 PM


All times are GMT -5. The time now is 08:36 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration