LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-28-2023, 01:49 PM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
SED syntax question


This is only a learning exercise.

Given a character string of length 8, produce an output string
which is characters 1, 3, 5, and 7 followed by the input string.

With this InFile...
Code:
abcdefgh
12345678
gxoxoxdx
... the desired OutFile is...
Code:
a c e g abcdefgh
1 3 5 7 12345678
g o o d gxoxoxdx
This gawk works, and is readable.
Code:
gawk -F '' '{print $1,$3,$5,$7,$0}' <$InFile
This sed works, and is readable...
Code:
sed -r 's/(.)(.)(.)(.)(.)(.)(.)(.)/\1 \3 \5 \7 \1\2\3\4\5\6\7\8/' <$InFile
... but it's long and clumsy.

This works, uses fewer keystrokes but is less readable...
Code:
sed -r 's/((.)(.)(.)(.)(.)(.)(.)(.))/\2 \4 \6 \8 \1/' <$InFile
Still fewer keystrokes, but less readable...
Code:
sed -r 's/((.).(.).(.).(.).)/\2 \3 \4 \5 \1/' <$InFile
This doesn't work.
Code:
sed -r 's/(((.).){4})/\2 \3 \4 \5 \1/' <$InFile
Your ideas? Remember, this is just "for funsies" and not an example of good coding practice.


Daniel B. Martin

.

Last edited by danielbmartin; 02-28-2023 at 01:55 PM. Reason: Correct a t7po
 
Old 02-28-2023, 04:56 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
I come up with this...

Code:
sed -r 'h;s/(.)[^ ]/\1 /g;G;s/[\n\r]+//' <$infile
But I kind of like this one...

Quote:
Originally Posted by danielbmartin View Post
Still fewer keystrokes, but less readable...
Code:
sed -r 's/((.).(.).(.).(.).)/\2 \3 \4 \5 \1/' <$InFile
... given the original problem specifies eight characters this is a kind of simple elegance. What is not readable about that?

Last edited by astrogeek; 02-28-2023 at 05:11 PM. Reason: Removed -n and p
 
1 members found this post helpful.
Old 02-28-2023, 05:19 PM   #3
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,599

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546

Here you go:
Code:
$ sed -r 'h; s/(.)./\1 /g; G; s/\n//' InFile
a c e g abcdefgh
1 3 5 7 12345678
g o o d gxoxoxdx
The first command is "h" which "holds" the current line - i.e. stores it in a variable.
The second substitutes even-positioned characters with space.
Third we use "G" to returns the the held text - this also adds a newline, so the the final command is needed to remove it.


Heh... guess I spent spent too long looking to see if Sed had a way to not add the newline - just about to post and see Astrogeek has posted almost the same thing.

 
3 members found this post helpful.
Old 02-28-2023, 10:11 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by astrogeek View Post
... given the original problem specifies eight characters this is a kind of simple elegance. What is not readable about that?
I mostly agree, but the outer parens are not needed:

Code:
sed -r 's/(.).(.).(.).(.)./\1 \2 \3 \4 &/' <$InFile
Alternatively, if you allow multiple reads of $InFile:
Code:
sed -r 's/(.)./\1 /g' <$InFile | paste -d' ' - $InFile
 
2 members found this post helpful.
Old 02-28-2023, 10:51 PM   #5
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
The first character class, [^ ], in my expression is obviously unnecessary (what was I thinking?). Should be replaced with a '.'.

The second, [\n\r], reduces to the single character \n if we assume Unix newlines.

Boughtonp's expression does both of those and is much better as a result.

Quote:
Originally Posted by ntubski View Post
I mostly agree, but the outer parens are not needed:

Code:
sed -r 's/(.).(.).(.).(.)./\1 \2 \3 \4 &/' <$InFile
Of course!

For the problem as stated, this is difficult to beat for simplicity and readability in my opinion.
 
1 members found this post helpful.
Old 02-28-2023, 10:59 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by ntubski View Post
Code:
sed -r 's/(.).(.).(.).(.)./\1 \2 \3 \4 &/' <$InFile
Shall we eliminate one more keystroke?!?
Code:
sed -r 's/(.).(.).(.).(.)/\1 \2 \3 \4 &/' <$InFile
Daniel B. Martin

.
 
1 members found this post helpful.
Old 03-01-2023, 09:28 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,599

Rep: Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546Reputation: 2546
Quote:
Originally Posted by astrogeek View Post
The second, [\n\r], reduces to the single character \n if we assume Unix newlines.
I always assume newlines only, and treat carriage returns as a bug to be removed. :)


Quote:
Originally Posted by danielbmartin View Post
Shall we eliminate one more keystroke?!?
Code:
sed -r 's/(.).(.).(.).(.)/\1 \2 \3 \4 &/' <$InFile
I'd say that ignoring the final character makes it less clear what the intent is, and also less maintainable - not worth it for a single dot.

However, what can be removed is the redirect via stdin, since Sed can read files directly. (Also not sure why it's a variable; and in a real script a filename variable must be double-quoted.)

If the shortest number of keystrokes/characters is the goal, the presence of only a single group in the solution astrogeek and I came up with means removing the -r actually results in a one-character shorter command:
Code:
sed -r 'h;s/(.)./\1 /g;G;s/\n//' InFile
sed 'h;s/\(.\)./\1 /g;G;s/\n//' InFile
Unless I've overlooked something in the Sed manual, I suspect if going shorter is possible, it would require a different method/tool.

Here's one such option that doesn't fully match the example OutFile, (but does adhere to "produce an output string which is characters 1, 3, 5, and 7 followed by the input string."):
Code:
$ paste <(cut -c1,3,5,7 InFile) InFile
aceg    abcdefgh
1357    12345678
good    gxoxoxdx
 
1 members found this post helpful.
Old 03-01-2023, 10:35 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,791

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Quote:
Originally Posted by boughtonp View Post
Here you go:
...
Heh... guess I spent spent too long looking to see if Sed had a way to not add the newline ...
The newline is put in between because it can be easily removed OR MODIFIED.
Also H and N put the newline.
Examples where the newline is useful:
Code:
sed -r 'h;s/(.)./\1 /g;G;s/(.*)\n(.*)/\2:\1/'
sed -r 'h;s/(.)./\1 /g;H;x;s/\n/:/'
[\n] only works in few sed versions; in Posix sed it means \ or n
GNU sed sticks to Posix if the environment variable POSIXLY_CORRECT is set.
A plain \n must work after G H N.
A plain \n without prior G H N works in some sed versions. (A Unix sed needs an ending \ and a new line.)
 
1 members found this post helpful.
Old 03-01-2023, 05:07 PM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
From the outset this problem was described as a learning experience. There was a wish for a more concise sed (i.e. fewer keystrokes). No mention was made of execution speed.

This thread certainly has been a learning experience!

An InFile was created with 250,000 lines with one 8-character
string per line. The timing results:

Code:
Method #1 of LQ member danielbmartin.
gawk -F ' ' '{print $1,$3,$5,$7,$0}'

real    0m0.300s
user    0m0.284s
sys    0m0.012s

Method #2 of LQ member danielbmartin.
sed -r 's/(.)(.)(.)(.)(.)(.)(.)(.)/\1 \3 \5 \7 \1\2\3\4\5\6\7\8/'

real    0m0.362s
user    0m0.340s
sys    0m0.008s

Method #3 of LQ member danielbmartin.
sed -r 's/((.)(.)(.)(.)(.)(.)(.)(.))/\2 \4 \6 \8 \1/'

real    0m0.371s
user    0m0.340s
sys    0m0.012s

Method #4 of LQ member danielbmartin.
sed -r 's/((.).(.).(.).(.).)/\2 \3 \4 \5 \1/'

real    0m0.293s
user    0m0.260s
sys    0m0.012s

Method #1 of LQ Moderator astrogeek.
sed -r 'h;s/(.)[^ ]/\1 /g;G;s/[\n\r]+//'

real    0m1.371s
user    0m1.340s
sys    0m0.016s

Method #1 of LQ Senior Member boughtonp.
sed -r 'h; s/(.)./\1 /g; G; s/\n//'

real    0m0.595s
user    0m0.568s
sys    0m0.008s

Method #2 of LQ Senior Member boughtonp.
sed 'h;s/\(.\)./\1 /g;G;s/\n//'

real    0m0.592s
user    0m0.556s
sys    0m0.016s

Method #3 of LQ Senior Member boughtonp.
paste <(cut -c1,3,5,7 $InFile)

real    0m0.039s
user    0m0.024s
sys    0m0.012s

Method #1 of LQ Senior Member ntubski.
sed -r 's/(.).(.).(.).(.)./\1 \2 \3 \4 &/'

real    0m0.273s
user    0m0.256s
sys    0m0.000s

Method #1.1 of LQ Senior Member ntubski.
sed -r 's/(.).(.).(.).(.)/\1 \2 \3 \4 &/'

real    0m0.268s
user    0m0.244s
sys    0m0.008s

Method #2 of LQ Senior Member ntubski.
sed -r 's/(.)./\1 /g' <$InFile | paste -d' '

real    0m0.503s
user    0m0.524s
sys    0m0.000s

Method #1 of LQ Senior Member MadeInGermany.
sed -r 'h;s/(.)./\1 /g;G;s/(.*)\n(.*)/\2:\1/'

real    0m1.631s
user    0m1.604s
sys    0m0.008s

Method #2 of LQ Senior Member MadeInGermany.
sed -r 'h;s/(.)./\1 /g;H;x;s/\n/:/'

real    0m0.602s
user    0m0.576s
sys    0m0.004s
Method #3 of boughtonp was a double winner -- the fastest
and the most concise. That solution took the liberty of
redefining the format of the OutFile but no function was lost.

Famous saying: "Beauty is in the eye of the beholder."
The same might be said of readability.


Daniel B. Martin

.
 
2 members found this post helpful.
Old 03-02-2023, 04:21 AM   #10
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,791

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
A variation of method #3 of boughtonp:
Code:
cut -c1,3,5,7 $InFile | paste -d" " - $InFile
 
1 members found this post helpful.
Old 03-02-2023, 08:55 PM   #11
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by MadeInGermany View Post
A variation of method #3 of boughtonp:
Code:
cut -c1,3,5,7 $InFile | paste -d" " - $InFile
Clean. Readable. Fast.

It would be nice if cut allowed overlay usages such as ..
Code:
cut -c1,3,5,7,1-8
Daniel B. Martin

.
 
Old 03-02-2023, 09:35 PM   #12
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
You may want to reconsider the thread title: SED syntax question

Great solutions to the problem however!
 
1 members found this post helpful.
Old 03-03-2023, 01:04 AM   #13
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
regarding performance:
using two tools instead of only one costs more, so if possible try to use only one. (in this case cut and paste together). Using the shell itself is probably even faster.
Do not try to measure performance on a few lines (because you will not be able to produce interpretable result), but millions of data.
Another thing (which is not that important at all) you can omit that < in most cases, awk, sed, grep, ... can handle files, so
Code:
awk 'script' file
# and
awk 'script' <file
are almost identical. (in the first case awk will open the file and in the second case the shell will open it and pass the file handler to awk).
As usual you can solve it in another languages too, like perl or python, but don't forget shell can do that too.
Code:
while read -r line;
do
    echo "${line:0:1} ${line:2:1} ${line:4:1} ${line:6:1} ${line}"
done <inputfile
 
Old 03-03-2023, 06:04 AM   #14
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by pan64 View Post
regarding performance:
using two tools instead of only one costs more, so if possible try to use only one. (in this case cut and paste together). Using the shell itself is probably even faster.
Do not try to measure performance on a few lines (because you will not be able to produce interpretable result), but millions of data.
I would predict that if you measure on a file with millions of lines, the shell solution will be much much slower (something like 10 times slower). Shell would only be faster compared to a solution that runs sed/cut/paste/whatever once per line.
 
Old 03-03-2023, 07:44 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
Quote:
Originally Posted by ntubski View Post
I would predict that if you measure on a file with millions of lines, the shell solution will be much much slower (something like 10 times slower). Shell would only be faster compared to a solution that runs sed/cut/paste/whatever once per line.
You can definitely try it. using regexp is much slower than ${var:x:y}, so I'm not really sure about that.
 
  


Reply

Tags
sed



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed Syntax Question StorageDon Programming 5 06-09-2016 12:33 AM
[SOLVED] Question about sed syntax in bash script musonio Linux - Software 2 08-21-2009 07:17 AM
Starting httpd: httpd: Syntax error on line 209 of /etc/httpd/conf/httpd.conf: Syntax sethukpathi Linux - Networking 6 04-12-2008 11:26 AM
Another SED Syntax Question... stlouis Programming 3 01-17-2008 08:50 PM
C++ syntax error before :: token HELP, i cant find the syntax error :( qwijibow Programming 2 12-14-2004 06:09 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration