LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 03-02-2007, 05:57 PM   #1
7stud
LQ Newbie
 
Registered: Feb 2007
Posts: 22

Rep: Reputation: 15
sed substitution with p flag


Hi,

Here is my file:
Quote:
Robert Robert Robert text
Robert text
Robert
I'm using this substitution:
Code:
sed -e 's/R* /R. /p' myFile
According to a tutorial:

http://uw714doc.sco.com/en/SHL_autom...functions.html

the p flag:
Quote:
Prints the line if a successful replacement was done. The p flag causes the line to be written to the output only if a substitution was actually made by the s function.
Here is the output I get from my sed command:
Quote:
RobertR. Robert Robert text
RobertR. Robert Robert text
RobertR. text
RobertR. text
Robert
I expected this:
Quote:
Robert Robert Robert text
Robert text
Can someone explain the output to me?

Last edited by 7stud; 03-02-2007 at 05:59 PM.
 
Old 03-03-2007, 02:45 AM   #2
7stud
LQ Newbie
 
Registered: Feb 2007
Posts: 22

Original Poster
Rep: Reputation: 15
I figured out the problem. Contrary to what my man pages say, sed does NOT use basic regular expressions, i.e. where * represents any character, any number of times. Instead, * is a modifier of the preceding character, meaning the preceding character should be present 0 or more times. So the regex I used is matching 'R' zero times plus a space, which matches the space after 'Robert'.

As for the p flag, apparently it causes the "pattern space" to be output immediately if a substitution took place, and then the pattern space gets output again in the normal course of things before the next line is processed.

Last edited by 7stud; 03-03-2007 at 03:19 AM.
 
Old 03-03-2007, 04:15 AM   #3
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
If you want to use sed to filter out lines that don't match a pattern, use the '-n' option and then use the p flag to output the lines. That is why you had two lines output instead of one.

Guess what, regular expressions have their own manpage! man 7 regex.
When the sed manpage refers to basic regular expressions, it means that sed uses the same regular expressions as grep. This is versus the extended regular expressions that awk and egrep use. For example, using sed, this pattern is literal "(abc|def)" including the "()|" characters. Using extended regular expressions, the pattern matches either "abc" or "def". You can use some of the extended features by escaping them, as in "^a\{8\}". Using a wildcard like * to represent any characters like the shell does is called globbing. In sed, if you want to match the decimal point, you need to escape it in a regular expression so that it doesn't represent a character wildcard.
/\.mpg/

Suppose that you have downloaded a lot of podcasts on your laptop and you are running out of room. You use k3b to burn a CD full of podcasts and save the project as podcasts1.k3b.
Now you want to use the saved project to delete the backed up files. Using unzip to extract maindata.xml from podcasts.k3b, you notice a bunch of lines like:
<file name="sn0014.mp3" >
<url>/home/auser/podcasts/sn0014.mp3</url>

You want to extract the names of the files and pipe that to an "rm" command removing just the files backed up.
So you want to ignore all the configuration information and just get lines like:
/home/auser/podcasts/sn0014.mp3

This oneliner will extract the file information:
sed -n -e '/<url>/s/^<url>\(.*\)<\/url>/\1/p' maindata.xml

Some of the titles might contain white space or special characters, which will cause a problem, so if the output was null separated like the -print0 option of the "find" command that would be great:

sed -n -e '/<url>/s/^<url>\(.*\)<\/url>/\1/p' maindata.xml | tr '\n' '\000' | xargs -L 500 -0 rm

Only the lines starting with "<url>" match. So they are the only ones processed.
After using this one-liner for a while you may come to some instances where the rm command can't find the file and the output contains something like "&gt;". Some characters are used by xml and so they need to be escaped. The "&gt;" string is how they are escaped. The '<' and '>' symbols are also escaped. I'll leave it to you to add terms to the sed command to convert them back.
Sed works best when there are clearly defined patterns. The <url> and </url> strings are great anchors. The characters between them are saved "\(...\)", used as the replacement "\n", and printed; "p".

Regular expressions can be very ugly however. So ugly that they are easier to write than to read. What is important is noticing the patterns that exist in the text source, and use those patterns to decide which lines to process and as anchors to "contain" the regular expression's wild cards.

Here is another example. Lets look at the hardware information on my wireless device:
Code:
 /sbin/lspci -v | sed -n '/Wireless/,/^$/p'
02:02.0 Network controller: Broadcom Corporation BCM4306 802.11b/g Wireless LAN Controller (rev 03)
        Subsystem: Hewlett-Packard Company NX9500 Built-in Wireless
        Flags: bus master, fast devsel, latency 64, IRQ 217
        Memory at e0104000 (32-bit, non-prefetchable) [size=8K]
I didn't have to wade though pages of output.

I thought these examples would provide more real life examples and show how handy sed is as a filter for on the fly one-liners.


I wish the sed manual was better written with more typical examples such as multiline substitutions. The manual for awk "GAWK: Effective Awk Programming" is an excellent book. If you want to learn awk, I would highly recommend downloading the source and generating the PS or PDF manual from the source .texi files.

Often all it takes to generate print worthy documentation is "./configure && make pdf". Then look in the doc/ subdirectory for the pdf manual.

Last edited by jschiwal; 03-03-2007 at 04:43 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Command substitution and sed daYz Linux - General 9 11-04-2006 01:15 AM
substitution with a function rigel_kent Programming 4 05-20-2006 05:28 PM
sed substitution conditional frostillicus Linux - Newbie 3 04-17-2005 12:36 AM
sed substitution error BlkPoohba Programming 1 08-25-2004 02:00 PM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 06:12 AM


All times are GMT -5. The time now is 11:20 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration