LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-03-2012, 09:23 AM   #1
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
sed: s command backward references with --regexp-extended


I have not been able to get the expected result using --regexp-extended in combination with the s command's \<n> backward references.

The intention is to extract the timestamp components from log files for a specific message (to paste into a spreadsheet for analysis).

This works for a single month
Code:
grep 'incorrect password length' /var/log/* | sed 's/[^:]*:Apr \([0-9]*\) \([^ ]*\).*/\1apr\t\2\tX/'
[snip]
28apr	12:27:07	X
Now I want to do the same for two months. Here's what I tried and the resulting error
Code:
grep 'incorrect password length' /var/log/* | sed --regexp-extended 's/[^:]*:\((Apr|May)\) \([0-9]*\) \([^ ]*\).*/\1\2\t\3\tX/'
sed: -e expression #1, char 57: invalid reference \3 on `s' command's RHS
I tried changing \(( to (\( and similarly for )\) but got the same error.
 
Old 05-03-2012, 09:47 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Although you don't provide an input example, the following test seems to work:
Code:
sed -r 's/.*((Apr|May)).*/\1X/' infile
Try implementing this for your situation.

Hope this helps.
 
1 members found this post helpful.
Old 05-03-2012, 10:36 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I do not have high skills in sed but maybe the data is better suited to awk if you are pulling delimited information out?
If you show some data I would be happy to try an awk solution
 
Old 05-03-2012, 10:55 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Thanks both

I didn't provide an input example!

Here's one output line from grep 'incorrect password length' /var/log/*.log
Code:
/var/log/daemon.log:May  1 20:42:53 LS1 smbd[17937]:   smb_pwd_check_ntlmv1: incorrect password length (64)
I tried removing the "\"s as drunna suggested and it solved the error but unexpectedly output the first match twice and not the third match:
Code:
root@LS1:~# mon1=Apr; mon2=May
root@LS1:~# egrep "^($mon1|$mon2) .*incorrect password length" /var/log/*.log \
    | sed --regexp-extended "s/[^:]*:(($mon1|$mon2)) ([0-9]*) ([^ ]*).*/\1\2\t\3\tX/" \
    | sort | uniq
AprApr	29	X
AprApr	30	X
MayMay		X
This worked (but I don't know why; the highlit asterisk is because the 1 in May 1 is left-padded with a space):
Code:
root@LS1:~# egrep "^($mon1|$mon2) .*incorrect password length" /var/log/*.log \
    | sed --regexp-extended "s/[^:]*:(($mon1|$mon2)) *([0-9]*) ([^ ]*).*/\1\3\t\4\tX/" \
    | sort | uniq
 
Old 05-03-2012, 11:16 AM   #5
firstfire
Member
 
Registered: Mar 2006
Location: Ekaterinburg, Russia
Distribution: Debian, Ubuntu
Posts: 709

Rep: Reputation: 428Reputation: 428Reputation: 428Reputation: 428Reputation: 428
Hi.

You don't need double parentheses, this should also work
Code:
sed -r "s/[^:]*:(Apr|May) *([0-9]*) ([^ ]*).*/\1\2\t\3\tX/"
 
1 members found this post helpful.
Old 05-03-2012, 12:02 PM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Would this be a "better" solution:
Code:
awk -v mon1="${mon1}" -v mon2="${mon2}" 'BEGIN{ FS="[ :]*"} /incorrect password length/ { if ( $2 == mon1 || $2 == mon2 ) { print $2, $3 , $4 ":" $5 ":" $6, $7 }}' infile
I'm not entirely clear on what the full intended output should be, the above outputs this:
Code:
$ cat infile
/var/log/daemon.log:Mar  1 20:42:53 LS1 smbd[17937]:   smb_pwd_check_ntlmv1: incorrect password length (64)
/var/log/daemon.log:Apr 11 20:42:53 LS1 smbd[17937]:   smb_pwd_check_ntlmv1: incorrect password length (64)
/var/log/daemon.log:May  1 20:42:53 LS1 smbd[17937]:   smb_pwd_check_ntlmv1: incorrect password length (64)
/var/log/daemon.log:Jun  1 20:42:53 LS1 smbd[17937]:   smb_pwd_check_ntlmv1: incorrect password length (64)
$ mon1=Apr; mon2=May
$ awk -v mon1="${mon1}" -v mon2="${mon2}" 'BEGIN{ FS="[ :]*"} /incorrect password length/ { if ( $2 == mon1 || $2 == mon2 ) { print $2, $3 , $4 ":" $5 ":" $6, $7 }}' infile 
Apr 11 20:42:53 LS1
May 1 20:42:53 LS1
 
1 members found this post helpful.
Old 05-03-2012, 12:41 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
Code:
awk -v months="(Apr|May)" '/incorrect password length/ && $0 ~ months{print $1,$2,$3,$4}' file
If you do wish to use bash variables for the months, may I suggest an array:
Code:
m=(Apr May)

IFS="|" && awk -v months="(${m[*]})" '/incorrect password length/ && $0 ~ months{print $1,$2,$3,$4}' file
 
1 members found this post helpful.
Old 05-03-2012, 12:51 PM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
You don't need grep either. sed can filter the lines out of the files directly. (So can awk, as grail demonstrates.)


Assuming the output format is just month/day/time, I like this version:

Code:
sed -nr "/incor.*pass.*length/ s/.*(($mon1|$mon2)[ ]+[0-9]+)[ ]+([0-9:]+).*/\1\t\3\tX/p" /var/log/*.log

#Produces the following output:
May  1  20:42:53        X
If the day needs to be in a separate tab-delimited field from the month, just move the parens around and add back the "\2" to the output.

Code:
sed -nr "/incor.*pass.*length/ s/.*($mon1|$mon2)[ ]+([0-9]+)[ ]+([0-9:]+).*/\1\t\2\t\3\tX/p" /var/log/*.log

May     1       20:42:53        X

Last edited by David the H.; 05-03-2012 at 12:58 PM. Reason: minor code alteration
 
Old 05-03-2012, 12:51 PM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,
Quote:
Originally Posted by grail View Post
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
LOL

Of course you are correct.

I don't belief that I overlooked that

BTW: Nice awk solutions!
 
Old 05-03-2012, 09:24 PM   #10
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by firstfire View Post
Hi.

You don't need double parentheses, this should also work
Code:
sed -r "s/[^:]*:(Apr|May) *([0-9]*) ([^ ]*).*/\1\2\t\3\tX/"
Thanks firstfire

It does work.

My understanding of regexes is defective. According to my understanding Apr|May should match Ap followed by r or M followed by ay but experiments show that is not the case.

In terms of the regex 7 man page the | separates two branches. A branch is one or more pieces, concatenated. A piece is an atom possibly followed by ... . An atom is ... or a single character with no other significance (matching that character).

If backreferencing is not being used on the month names so the regex is "s/[^:]*:Apr|May *([0-9]*) ...", how is the branch after the | terminated? Why does it stop after May?

Last edited by catkin; 05-03-2012 at 09:41 PM.
 
Old 05-03-2012, 09:30 PM   #11
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by druuna View Post
I'm not entirely clear on what the full intended output should be ...
The intention is to generate a sting in which the first tab-separated field is recognised as a date when pasted into a spreadsheet. I was aiming for the same as your output but with no space between month name and day number but they are effectively the same. Neither work as intended! The day number must come before the month name.
 
Old 05-03-2012, 09:40 PM   #12
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
If I am reading this correctly, you do not need to change FS for awk as the demo data is the output of grep which has prepended the name and the colon:
Code:
awk -v months="(Apr|May)" '/incorrect password length/ && $0 ~ months{print $1,$2,$3,$4}' file
If you do wish to use bash variables for the months, may I suggest an array:
Code:
m=(Apr May)

IFS="|" && awk -v months="(${m[*]})" '/incorrect password length/ && $0 ~ months{print $1,$2,$3,$4}' file
That's right.

Using an array like that does make it neater.

What is the purpose of IFS="|" and of && ?
 
Old 05-03-2012, 11:18 PM   #13
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,783

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by catkin View Post
My understanding of regexes is defective. According to my understanding Apr|May should match Ap followed by r or M followed by ay but experiments show that is not the case.
"|" has the lowest precedence.

Quote:
If backreferencing is not being used on the month names so the regex is "s/[^:]*:Apr|May *([0-9]*) ...", how is the branch after the | terminated? Why does it stop after May?
If you remove the parens then the branch is not terminated, ie your 2 branches would be [^:]*:Apr and May *([0-9]*) ...
 
1 members found this post helpful.
Old 05-03-2012, 11:57 PM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
What is the purpose of IFS="|" and of && ?
When using * for a quoted array it will use the first delimiter in IFS to separate the elements, so by setting IFS to a pipe it gives us the desired output.

&& is to ensure the previous task worked prior to using it.
 
1 members found this post helpful.
Old 05-04-2012, 12:40 AM   #15
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
When using * for a quoted array it will use the first delimiter in IFS to separate the elements, so by setting IFS to a pipe it gives us the desired output.

&& is to ensure the previous task worked prior to using it.
Neat.

Hard to imagine the circumstances in which IFS='|' would not work but ...

EDIT:
Code:
c@CW8:/tmp$ IFS='|' && echo true || echo false
bash: IFS: readonly variable
[neither true nor false echoed]

Last edited by catkin; 05-04-2012 at 12:47 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed and regexp matching (GNU sed version 4.2.1) Ashkhan Programming 8 02-27-2012 09:12 AM
[SOLVED] sed and back references replacement angel115 Linux - General 2 05-16-2011 10:42 AM
Problem with sed regexp Chipper Linux - General 12 03-19-2011 05:49 AM
sed and regexp for search in multilines Felipe Linux - Software 10 09-27-2010 07:58 AM
help with sed / regexp elinenbe Programming 2 02-01-2008 10:09 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:39 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration