[SOLVED] sed: s command backward references with --regexp-extended
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I do not have high skills in sed but maybe the data is better suited to awk if you are pulling delimited information out?
If you show some data I would be happy to try an awk solution
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
You don't need grep either. sed can filter the lines out of the files directly. (So can awk, as grail demonstrates.)
Assuming the output format is just month/day/time, I like this version:
Code:
sed -nr "/incor.*pass.*length/ s/.*(($mon1|$mon2)[ ]+[0-9]+)[ ]+([0-9:]+).*/\1\t\3\tX/p" /var/log/*.log
#Produces the following output:
May 1 20:42:53 X
If the day needs to be in a separate tab-delimited field from the month, just move the parens around and add back the "\2" to the output.
Code:
sed -nr "/incor.*pass.*length/ s/.*($mon1|$mon2)[ ]+([0-9]+)[ ]+([0-9:]+).*/\1\t\2\t\3\tX/p" /var/log/*.log
May 1 20:42:53 X
Last edited by David the H.; 05-03-2012 at 12:58 PM.
Reason: minor code alteration
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
You don't need double parentheses, this should also work
Code:
sed -r "s/[^:]*:(Apr|May) *([0-9]*) ([^ ]*).*/\1\2\t\3\tX/"
Thanks firstfire
It does work.
My understanding of regexes is defective. According to my understanding Apr|May should match Ap followed by r or M followed by ay but experiments show that is not the case.
In terms of the regex 7 man page the | separates two branches. A branch is one or more pieces, concatenated. A piece is an atom possibly followed by ... . An atom is ... or a single character with no other significance (matching that character).
If backreferencing is not being used on the month names so the regex is "s/[^:]*:Apr|May *([0-9]*) ...", how is the branch after the | terminated? Why does it stop after May?
I'm not entirely clear on what the full intended output should be ...
The intention is to generate a sting in which the first tab-separated field is recognised as a date when pasted into a spreadsheet. I was aiming for the same as your output but with no space between month name and day number but they are effectively the same. Neither work as intended! The day number must come before the month name.
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
My understanding of regexes is defective. According to my understanding Apr|May should match Ap followed by r or M followed by ay but experiments show that is not the case.
"|" has the lowest precedence.
Quote:
If backreferencing is not being used on the month names so the regex is "s/[^:]*:Apr|May *([0-9]*) ...", how is the branch after the | terminated? Why does it stop after May?
If you remove the parens then the branch is not terminated, ie your 2 branches would be [^:]*:Apr and May *([0-9]*) ...
When using * for a quoted array it will use the first delimiter in IFS to separate the elements, so by setting IFS to a pipe it gives us the desired output.
&& is to ensure the previous task worked prior to using it.
When using * for a quoted array it will use the first delimiter in IFS to separate the elements, so by setting IFS to a pipe it gives us the desired output.
&& is to ensure the previous task worked prior to using it.
Neat.
Hard to imagine the circumstances in which IFS='|' would not work but ...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.