ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
$ sed -r 'h; s/(.)./\1 /g; G; s/\n//' InFile
a c e g abcdefgh
1 3 5 7 12345678
g o o d gxoxoxdx
The first command is "h" which "holds" the current line - i.e. stores it in a variable.
The second substitutes even-positioned characters with space.
Third we use "G" to returns the the held text - this also adds a newline, so the the final command is needed to remove it.
Heh... guess I spent spent too long looking to see if Sed had a way to not add the newline - just about to post and see Astrogeek has posted almost the same thing.
The second, [\n\r], reduces to the single character \n if we assume Unix newlines.
I always assume newlines only, and treat carriage returns as a bug to be removed. :)
Quote:
Originally Posted by danielbmartin
Shall we eliminate one more keystroke?!?
Code:
sed -r 's/(.).(.).(.).(.)/\1 \2 \3 \4 &/' <$InFile
I'd say that ignoring the final character makes it less clear what the intent is, and also less maintainable - not worth it for a single dot.
However, what can be removed is the redirect via stdin, since Sed can read files directly. (Also not sure why it's a variable; and in a real script a filename variable must be double-quoted.)
If the shortest number of keystrokes/characters is the goal, the presence of only a single group in the solution astrogeek and I came up with means removing the -r actually results in a one-character shorter command:
Code:
sed -r 'h;s/(.)./\1 /g;G;s/\n//' InFile
sed 'h;s/\(.\)./\1 /g;G;s/\n//' InFile
Unless I've overlooked something in the Sed manual, I suspect if going shorter is possible, it would require a different method/tool.
Here's one such option that doesn't fully match the example OutFile, (but does adhere to "produce an output string which is characters 1, 3, 5, and 7 followed by the input string."):
Here you go:
...
Heh... guess I spent spent too long looking to see if Sed had a way to not add the newline ...
The newline is put in between because it can be easily removed OR MODIFIED.
Also H and N put the newline.
Examples where the newline is useful:
Code:
sed -r 'h;s/(.)./\1 /g;G;s/(.*)\n(.*)/\2:\1/'
sed -r 'h;s/(.)./\1 /g;H;x;s/\n/:/'
[\n] only works in few sed versions; in Posix sed it means \ or n
GNU sed sticks to Posix if the environment variable POSIXLY_CORRECT is set.
A plain \n must work after G H N.
A plain \n without prior G H N works in some sed versions. (A Unix sed needs an ending \ and a new line.)
From the outset this problem was described as a learning experience. There was a wish for a more concise sed (i.e. fewer keystrokes). No mention was made of execution speed.
This thread certainly has been a learning experience!
An InFile was created with 250,000 lines with one 8-character
string per line. The timing results:
Code:
Method #1 of LQ member danielbmartin.
gawk -F ' ' '{print $1,$3,$5,$7,$0}'
real 0m0.300s
user 0m0.284s
sys 0m0.012s
Method #2 of LQ member danielbmartin.
sed -r 's/(.)(.)(.)(.)(.)(.)(.)(.)/\1 \3 \5 \7 \1\2\3\4\5\6\7\8/'
real 0m0.362s
user 0m0.340s
sys 0m0.008s
Method #3 of LQ member danielbmartin.
sed -r 's/((.)(.)(.)(.)(.)(.)(.)(.))/\2 \4 \6 \8 \1/'
real 0m0.371s
user 0m0.340s
sys 0m0.012s
Method #4 of LQ member danielbmartin.
sed -r 's/((.).(.).(.).(.).)/\2 \3 \4 \5 \1/'
real 0m0.293s
user 0m0.260s
sys 0m0.012s
Method #1 of LQ Moderator astrogeek.
sed -r 'h;s/(.)[^ ]/\1 /g;G;s/[\n\r]+//'
real 0m1.371s
user 0m1.340s
sys 0m0.016s
Method #1 of LQ Senior Member boughtonp.
sed -r 'h; s/(.)./\1 /g; G; s/\n//'
real 0m0.595s
user 0m0.568s
sys 0m0.008s
Method #2 of LQ Senior Member boughtonp.
sed 'h;s/\(.\)./\1 /g;G;s/\n//'
real 0m0.592s
user 0m0.556s
sys 0m0.016s
Method #3 of LQ Senior Member boughtonp.
paste <(cut -c1,3,5,7 $InFile)
real 0m0.039s
user 0m0.024s
sys 0m0.012s
Method #1 of LQ Senior Member ntubski.
sed -r 's/(.).(.).(.).(.)./\1 \2 \3 \4 &/'
real 0m0.273s
user 0m0.256s
sys 0m0.000s
Method #1.1 of LQ Senior Member ntubski.
sed -r 's/(.).(.).(.).(.)/\1 \2 \3 \4 &/'
real 0m0.268s
user 0m0.244s
sys 0m0.008s
Method #2 of LQ Senior Member ntubski.
sed -r 's/(.)./\1 /g' <$InFile | paste -d' '
real 0m0.503s
user 0m0.524s
sys 0m0.000s
Method #1 of LQ Senior Member MadeInGermany.
sed -r 'h;s/(.)./\1 /g;G;s/(.*)\n(.*)/\2:\1/'
real 0m1.631s
user 0m1.604s
sys 0m0.008s
Method #2 of LQ Senior Member MadeInGermany.
sed -r 'h;s/(.)./\1 /g;H;x;s/\n/:/'
real 0m0.602s
user 0m0.576s
sys 0m0.004s
Method #3 of boughtonp was a double winner -- the fastest and the most concise. That solution took the liberty of
redefining the format of the OutFile but no function was lost.
Famous saying: "Beauty is in the eye of the beholder."
The same might be said of readability.
regarding performance:
using two tools instead of only one costs more, so if possible try to use only one. (in this case cut and paste together). Using the shell itself is probably even faster.
Do not try to measure performance on a few lines (because you will not be able to produce interpretable result), but millions of data.
Another thing (which is not that important at all) you can omit that < in most cases, awk, sed, grep, ... can handle files, so
Code:
awk 'script' file
# and
awk 'script' <file
are almost identical. (in the first case awk will open the file and in the second case the shell will open it and pass the file handler to awk).
As usual you can solve it in another languages too, like perl or python, but don't forget shell can do that too.
Code:
while read -r line;
do
echo "${line:0:1} ${line:2:1} ${line:4:1} ${line:6:1} ${line}"
done <inputfile
regarding performance:
using two tools instead of only one costs more, so if possible try to use only one. (in this case cut and paste together). Using the shell itself is probably even faster.
Do not try to measure performance on a few lines (because you will not be able to produce interpretable result), but millions of data.
I would predict that if you measure on a file with millions of lines, the shell solution will be much much slower (something like 10 times slower). Shell would only be faster compared to a solution that runs sed/cut/paste/whatever once per line.
I would predict that if you measure on a file with millions of lines, the shell solution will be much much slower (something like 10 times slower). Shell would only be faster compared to a solution that runs sed/cut/paste/whatever once per line.
You can definitely try it. using regexp is much slower than ${var:x:y}, so I'm not really sure about that.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.