LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   using sed to inclusively match multiple patterns in one line..?? (https://www.linuxquestions.org/questions/programming-9/using-sed-to-inclusively-match-multiple-patterns-in-one-line-4175476668/)

WetFroggy 09-11-2013 01:19 AM

using sed to inclusively match multiple patterns in one line..??
 
I have a log file for an application, that contains a safe, yet misbehaving, process error message. I need to purge said error message out of the log so that the admins can actually see if anything serious is going on. I have tried and failed to figure out how to use awk, sed, even grep to look for a pattern (even going as far as splitting it into multiple patterns) so the offending line can be deleted.

Arrangement of log:
some-time-date-stamp [event] [processname] message

The pattern that matters:
[event] [processname] message.

failed attempts:
sed 's.* \[FINE\] \[ProcessA\] /space consuming error/d' ./Bad.log > ./New.log

sed '/\[FINE\]/,/\[ProcessA\]/,/space consuming error/d' ./Bad.log > ./New.log

Semi-Sucessful attempt:
This one captures only the error, which wasn't entirely what I wanted (I don't know how to make this an "everything but said pattern")
awk '/\[FINE\] \[ProcessA\] space consuming message/' ./Bad.log > ./New.log

This purges everything that had to do with the error, only it removes [BAD] items as well.
sed '/space consuming error/d' ./Bad.log > ./New.log

What I believe I don't understand, is how to deal with the [] characters properly. Although at this point, I'm now clueless - I can't even get the right internet search term to figure this out.

druuna 09-11-2013 01:50 AM

Without an actual example I can only give general advise.

Assuming this as input:
Code:

some-time-date-stamp [event] [processname] message
some-time-date-stamp [eventA] [processnameA] error message one
some-time-date-stamp [eventB] [processnameB] error message two
some-time-date-stamp [eventA] [processnameA] error message three

The following will only print lines that contain [ProcessA]:
Code:

sed -n '/\[ProcessA\]/p' infile
And this will print lines that contain [ProcessA] and three:
Code:

sed -n '/\[eventA\].*three/p' infile
To remove those from the input you can do (the first example):
Code:

sed '/\[eventA\]/d' infile
Or, for the second example:
Code:

sed '/\[eventA\].*three/d' infile
These links might come in handy:
Sed:

WetFroggy 09-11-2013 10:26 AM

Thank you for a response, I'll try to clarify my issue now that I took my mind off this.

I have a process, that can have various events, but one event currently is over helpful (until someone on my end understands why).

So we could have a condition in the log such as :
Code:

some-time-date-stamp [event] [processname] message
some-time-date-stamp [eventA] [processnameA] error message one
some-time-date-stamp [eventB] [processnameB] error message two
some-time-date-stamp [eventB] [processnameA] error message three
some-time-date-stamp [eventA] [processnameB] error message one

As I understand it, Event is commonly used, the process handles a good chunk of the application, so sometimes shows up for other reasons, and the message itself can be called when a completely different event is called. So the only way I can filter out the misbehaving process, is to either use 3 matching patterns, or the string containing the square brackets.

When I tried using the string containing square brackets, I'd get an error, "unterminated `s' command"

A blind stab using 3 patterns didn't work either.
Failed Attempt 1:
sed -n '/'"$Pat1"'/,/'"$Pat2"'/,/'"$Pat3"'/d' ./Working.log > ./NewTest

Failed Attempt 2:
sed -n '/'$Pat1'/,/'$Pat2'/,/'$Pat3'/d' ./Working.log > ./NewTest
"unknown command: `,'"

A solution (if it exists) needs, if its a multiple pattern, to be used as such {pat1 AND pat2 AND pat3 } on a line by line basis. If on the other-hand I can use the entire string complete with square brackets, I can be certain just the unwanted line departs.

While I have seen vast examples out there, those are for multiple items within the entire file, my troubles are sticking to a single line (which shows up thousands, if not millions of times in one log file). Have I clarified this?

grail 09-11-2013 11:05 AM

Well I still find the example a little vague, but druuna has already shown that if you escape (place a \ before) the square brackets they will be treated as literal and used as part of the search.

As you have hidden your patterns in variables it is not clear what you have entered in each one and where we can help you improve it.

druuna 09-11-2013 11:06 AM

Quote:

Originally Posted by WetFroggy (Post 5025882)
Thank you for a response, I'll try to clarify my issue now that I took my mind off this.

I have a process, that can have various events, but one event currently is over helpful (until someone on my end understands why).

So we could have a condition in the log such as :
Code:

some-time-date-stamp [event] [processname] message
some-time-date-stamp [eventA] [processnameA] error message one
some-time-date-stamp [eventB] [processnameB] error message two
some-time-date-stamp [eventB] [processnameA] error message three
some-time-date-stamp [eventA] [processnameB] error message one

As I understand it, Event is commonly used, the process handles a good chunk of the application, so sometimes shows up for other reasons, and the message itself can be called when a completely different event is called. So the only way I can filter out the misbehaving process, is to either use 3 matching patterns, or the string containing the square brackets.

When I tried using the string containing square brackets, I'd get an error, "unterminated `s' command"

A blind stab using 3 patterns didn't work either.
Failed Attempt 1:
sed -n '/'"$Pat1"'/,/'"$Pat2"'/,/'"$Pat3"'/d' ./Working.log > ./NewTest

Failed Attempt 2:
sed -n '/'$Pat1'/,/'$Pat2'/,/'$Pat3'/d' ./Working.log > ./NewTest
"unknown command: `,'"

A solution (if it exists) needs, if its a multiple pattern, to be used as such {pat1 AND pat2 AND pat3 } on a line by line basis. If on the other-hand I can use the entire string complete with square brackets, I can be certain just the unwanted line departs.

While I have seen vast examples out there, those are for multiple items within the entire file, my troubles are sticking to a single line (which shows up thousands, if not millions of times in one log file). Have I clarified this?

I'm not sure I'm clear about what you want.

You have added an entry to my example, but you fail to show what the expected output would need to look like.

If I understand the explanation given by you you want to match multiple patterns (say: processnameA or processnameB) in combination with another pattern. If that is the case have a look at this:
Code:

sed -rn '/(\[processnameA\]|\[processnameB\]).*error message one/p' infile
The above looks for lines that contain:
-[processnameA] OR [processnameB] (the green part) AND error message one (the blue part)

Assuming the example given by you in post #3, the output would be:
Code:

some-time-date-stamp [eventA] [processnameA] error message one
some-time-date-stamp [eventA] [processnameB] error message one

You're not limited in any way, so you could this:
Code:

sed -rn '/(\[eventA\]|\[eventB\]) (\[processnameA\]|\[processnameB\]).*(error message one| error message two|error message three)/p' infile
This looks for:
- [eventA] OR [eventB]
- AND
- [processnameA] OR [processnameB]
- AND
- error message one OR error message two OR error message three

WetFroggy 09-11-2013 01:31 PM

Quote:

Originally Posted by druuna (Post 5025907)
I'm not sure I'm clear about what you want.

You have added an entry to my example, but you fail to show what the expected output would need to look like.

If I understand the explanation given by you you want to match multiple patterns (say: processnameA or processnameB) in combination with another pattern. If that is the case have a look at this:
Code:

sed -rn '/(\[processnameA\]|\[processnameB\]).*error message one/p' infile
The above looks for lines that contain:
-[processnameA] OR [processnameB] (the green part) AND error message one (the blue part)

Assuming the example given by you in post #3, the output would be:
Code:

some-time-date-stamp [eventA] [processnameA] error message one
some-time-date-stamp [eventA] [processnameB] error message one

You're not limited in any way, so you could this:
Code:

sed -rn '/(\[eventA\]|\[eventB\]) (\[processnameA\]|\[processnameB\]).*(error message one| error message two|error message three)/p' infile
This looks for:
- [eventA] OR [eventB]
- AND
- [processnameA] OR [processnameB]
- AND
- error message one OR error message two OR error message three

Thank you druuna, that's what I was missing!
each 'pattern' surrounded by rounded brackets. and a switch (regexp).
Code:

sed -r '/(\[eventA\]) (\[processname\]).*(message)/d'  ./Working.log > ./NewTest


All times are GMT -5. The time now is 02:46 AM.