LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-10-2009, 08:06 PM   #1
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
SED vs. AWK showdown


Every so often, there's enthusiastic debate** about which is better: SED or AWK. I'd like to see this be a fun thing---maybe even a learning thing.

Here's a problem:
Given a file with a word list, rearrange into sentences which all begin with a keyword. Remove any leading spaces, and extra spaces between words.

Here is a SED solution (keyword = "the"):
Code:
sed -n '${H;x;s/\n/ /g;s/^ *//;s/ \+/ /g;s/ the/\nthe/g;p};H' words.txt
And the file I used is attached.


The logic used:
while not at the end of file, append each line to the hold register.
when EOF is reached, also append that last line, then bring the hold register into the working register. Now we have the whole file in the register. Then:
replace all linefeeds with spaces
replace all spaces at the beginning
replace all multiple spaces with just one
insert line breaks before all instances of the keyword (the), except at the beginning, where there is no space.
print the result

So:
Is there an AWK solution which is:
faster?
less code?

**many of which I lost......
Attached Files
File Type: txt words.txt (78 Bytes, 8 views)
 
Old 11-10-2009, 09:24 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
less code doesn't mean its always legible or understandable.
Code:
awk 'NR>1&&$1=="the"{print ""}{ printf "%s ",$0}' words.txt
because printf doesn't insert a newline unless you tell it to, the output you see will be lines concat together, until the key word "the" is found, then print a newline. this is much more simpler to understand than the bunch of sed secret code

Last edited by ghostdog74; 11-10-2009 at 10:10 PM.
 
Old 11-10-2009, 09:26 PM   #3
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 111Reputation: 111
Neither sed or awk, but..
Code:
#!/bin/bash
while read -r LINE ; do
   BUFFER="$BUFFER $LINE"
done
BUFFER="${BUFFER//  / }"
BUFFER="${BUFFER:1}"
if [[ "${BUFFER:$((${#BUFFER}-1))}" == " " ]] ; then
   BUFFER="${BUFFER:0:$((${#BUFFER}-1))}"
fi
BUFFER="${BUFFER// the/$'\n'the}"
echo "$BUFFER"
Isn't parameter expansion fun?

Last edited by tuxdev; 11-10-2009 at 09:40 PM.
 
Old 11-10-2009, 09:29 PM   #4
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
its a small issue. you forget the file name
 
Old 11-10-2009, 09:32 PM   #5
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 111Reputation: 111
It's stdin, of course! Like any good "filter" type script ought to behave.
 
Old 11-10-2009, 09:41 PM   #6
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by tuxdev View Post
It's stdin, of course! Like any good "filter" type script ought to behave.
i take it that you mean input redirection. that's fine.

regarding your script, if you are going to buffer every line of the file before processing, its going to be very slow when the file sizes are huge.
 
Old 11-10-2009, 09:49 PM   #7
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Original Poster
Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by ghostdog74 View Post
less code doesn't mean its always legible or understandable.
Code:
awk 'NR>1&&/the/{print ""}{ printf "%s ",$0}' words.txt
because printf doesn't insert a newline unless you tell it to, the output you see will be lines concat together, until the key word "the" is found, then print a newline. this is much more simpler to understand than the bunch of sed secret code
Did you run a speed test?

I'll be happy to do it, but later. Now I must watch "V"....
 
Old 11-10-2009, 09:50 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Original Poster
Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by tuxdev View Post
Neither sed or awk, but..
Code:
#!/bin/bash
while read -r LINE ; do
   BUFFER="$BUFFER $LINE"
done
BUFFER="${BUFFER//  / }"
BUFFER="${BUFFER:1}"
if [[ "${BUFFER:$((${#BUFFER}-1))}" == " " ]] ; then
   BUFFER="${BUFFER:0:$((${#BUFFER}-1))}"
fi
BUFFER="${BUFFER// the/$'\n'the}"
echo "$BUFFER"
Isn't parameter expansion fun?
That just might be more obfuscated than my SED solution.....
 
Old 11-10-2009, 10:01 PM   #9
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Code:
echo `cat words.txt` | sed 's/ the/\nthe/g'
Kevin Barry
 
Old 11-10-2009, 10:02 PM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by pixellany View Post
Did you run a speed test?
no, please do. I suspect it will be slower, since I am calling printf every time. Might be better to save them in memory before printing at the right time. But well, i prefer readability more than speed concerns.

NB: please change your sed to take care of things like
Code:
the
thesis is
done

Last edited by ghostdog74; 11-10-2009 at 10:12 PM.
 
Old 11-10-2009, 10:05 PM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
Code:
echo `cat words.txt` | sed 's/ the/\nthe/g'
Kevin Barry
ok on small files, but will choke on big files.
 
Old 11-10-2009, 10:10 PM   #12
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by ghostdog74 View Post
ok on small files, but will choke on big files.
It was more of a joke, actually. I think it's funny how people often try to get everything done in one call to one command when it's almost never necessary, because it will probably just go into a script, anyway.
Kevin Barry
 
Old 11-10-2009, 10:13 PM   #13
tuxdev
Senior Member
 
Registered: Jul 2005
Distribution: Slackware
Posts: 2,012

Rep: Reputation: 111Reputation: 111
The bigger problem is if words.txt ends with ";rm -rf ~"
 
Old 11-10-2009, 10:16 PM   #14
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Quote:
Originally Posted by tuxdev View Post
The bigger problem is if words.txt ends with ";rm -rf ~"
Only if one precedes the line with eval...
Kevin Barry

Last edited by ta0kira; 11-10-2009 at 10:20 PM. Reason: grammar
 
Old 11-10-2009, 10:19 PM   #15
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by ta0kira View Post
Only if one precedes the line by eval...
Kevin Barry
that's right. we are not "eval"ing each line as it go. just reformatting text.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
sed or awk ilo Programming 1 08-22-2008 10:38 AM
awk or sed help cmontr Programming 16 05-14-2008 10:59 AM
awk and/or sed linux2man Linux - General 7 01-22-2007 10:02 AM
Sed and Awk Gins Programming 7 04-19-2006 10:32 AM
awk/sed help pantera Programming 1 05-13-2004 11:59 PM


All times are GMT -5. The time now is 12:12 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration