LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Combining regex in grep (https://www.linuxquestions.org/questions/linux-general-1/combining-regex-in-grep-901344/)

devUnix 09-06-2011 12:49 AM

Combining regex in grep
 
I have logs that look like as follows:

Some IP ... Some Text ... [Sep/03/2011:15:55:30 -Some Number]... Some Text... End of Line
Some IP ... Some Text ... [Sep/03/2011:18:45:27 -Some Number]... Some Text... End of Line
Some IP ... Some Text ... [Sep/03/2011:21:46:00 -Some Number]... Some Text... End of Line

I want to "grep" lines for the hours between 16:00:00 and 22:00:00 only.

So these two lines of "grep" work fine for me:

Code:

grep -hE ':1[6789]:[0-9][0-9]:[0-9][0-9]:*' data.log
grep -hE ':2[01]:[0-9][0-9]:[0-9][0-9]:*' data.log

But I want to combine the "regex" I have used in the above lines because I, in fact, want a count of the matches. So "-c" will give me the count but since I have used two "grep" commands so I get two counts. Well, I do not wish to sum the counts as it is redundant.

I have tried these methods:


Code:

grep -hE ':1[6789]:[0-9][0-9]:[0-9][0-9]:* \| :2[01]:[0-9][0-9]:[0-9][0-9]:*' data.log
but it gives results for the first part of the regex only and


Code:

grep -hE ':1[6789]\|2[01]:[0-9][0-9]:[0-9][0-9]:*' data.log
gives me results for "1[6789]" only and the regex matches values in Minutes and Seconds as well.

So, is there any way to say:

I want to "grep" piece of data from lines containing hours values preceded by a : and followed by a : and found in the form of HH:MM:SS (24 hours clock) and the time I am interested in is between 16 and 22 only irrespective of any Minutes and Seconds in between.

druuna 09-06-2011 03:53 AM

Hi,
Quote:

Originally Posted by devUnix (Post 4462550)
Code:

grep -hE ':1[6789]:[0-9][0-9]:[0-9][0-9]:* \| :2[01]:[0-9][0-9]:[0-9][0-9]:*' data.log

If I understand the problem correctly, you where almost there with the above command.

Try this:
Code:

grep -hE '(:1[6789]:[0-9][0-9]:[0-9][0-9]:*|:2[01]:[0-9][0-9]:[0-9][0-9]:*)' data.log
If you want to use multiple patterns when using grep (egrep or grep -E) you need to put the between ( and ) seperated by a | (not escaped). These will grep foo or bar:
Code:

egrep "(foo|bar)" file
grep -E "(foo|bar)" file

Hope this helps.

devUnix 09-06-2011 11:11 AM

Quote:

Originally Posted by druuna (Post 4462662)
Hi,
If I understand the problem correctly, you where almost there with the above command.

Try this:
Code:

grep -hE '(:1[6789]:[0-9][0-9]:[0-9][0-9]:*|:2[01]:[0-9][0-9]:[0-9][0-9]:*)' data.log
If you want to use multiple patterns when using grep (egrep or grep -E) you need to put the between ( and ) seperated by a | (not escaped). These will grep foo or bar:
Code:

egrep "(foo|bar)" file
grep -E "(foo|bar)" file

Hope this helps.


Thanks Druuna!

I had tried parentheses (|) before also but not sure what I missed out that caused an error on the production box. Only now I found that I had not quoted the pattern:

Code:

[demo@localhost Bash]$ grep -E :(1[6789]|2[12]):[0-9][0-9]:[0-9][0-9].* sample.log
bash: syntax error near unexpected token `('
[demo@localhost Bash]$

It works fine:

Code:

[demo@localhost Bash]$ cat sample.log
127.0.0.1 [Sep/02/2011:01:19:21 -123456] --- Blah! Blah!
127.0.0.1 [Sep/02/2011:14:21:23 -123456] --- Blah! Blah!
127.0.0.1 [Sep/02/2011:16:35:11 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:19:35:11 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:20:00:00 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:21:12:05 -434445] --- ooops! Oi!
127.0.0.1 [Sep/02/2011:21:22:05 -434445] --- ooops! Oi!
127.0.0.1 [Sep/02/2011:25:16:22 -234244] ..... Eureka!
127.0.0.1 [Sep/02/2011:25:45:22 -234244] ..... Eureka!
[demo@localhost Bash]$ grep -E ':(1[6789]|2[01]):[0-9][0-9]:[0-9][0-9].*' sample.log
127.0.0.1 [Sep/02/2011:16:35:11 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:19:35:11 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:20:00:00 -445566] --- oh! Ha! Ho!
127.0.0.1 [Sep/02/2011:21:12:05 -434445] --- ooops! Oi!
127.0.0.1 [Sep/02/2011:21:22:05 -434445] --- ooops! Oi!
[demo@localhost Bash]$

So, I have tried this pattern which is short for what you and I have discussed above:

Code:

':(1[6789]|2[01]):[0-9][0-9]:[0-9][0-9].*'


All times are GMT -5. The time now is 11:15 PM.