sed variable multiline match

cptsockpuppet · 04-30-2012, 10:29 AM

So I've been tasked with parsing a log file and converting into a nice single line log. The log file outputs information in the following format:

Code:

1 ticket(s) written to the database."^M
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M
- XXX#999999^M

The problem is that the number of lines can vary from ticket to ticket. Some tickets are 3 lines some are 6 lines. Is there some way to have SED match across those lines and strip out the new line characters for each ticket (Making it a nice single line entry)? Or should I move on to a different tool?

Thank you

danielbmartin · 04-30-2012, 10:53 AM

Quote:

Originally Posted by cptsockpuppet

The problem is that the number of lines can vary from ticket to ticket. Some tickets are 3 lines some are 6 lines.

What characteristic of a line indicates "this is the first line of a multi-line ticket" versus "this is a continuation line of a multi-line ticket"?

Daniel B. Martin

millgates · 04-30-2012, 10:57 AM

I would try awk here:

Code:

awk '
    /^[0-9]+ ticket\(s\) written/ {if (NR>1)print tmp; tmp = ""}
    {sub(/^M$/,""); tmp=tmp" "$0}
    END {print tmp}
' < filename

assuming ^M is a newline metacharacter and not ^ and M.

Snark1994 · 04-30-2012, 11:03 AM

How do you know how long a ticket is? Is it just from "X tickets(s)" to the next time it's in the file? With sed, the best I can think of is this horrific thing:

Code:

sed 's/[[:digit:]]\+ ticket/@&/' YOUR_FILE | sed -e :a -e '$!N;s/\n[^@]/ /;ta' -e 'P;D' | sed 's/^@//'

This will find all lines which begin "_number_ ticket" and change it to "@_number_ ticket", then goes through the file thus produced, appending any line that doesn't begin with '@' to the previous line, then goes through again and removes all the '@'s. It's rather assuming that no line begins with '@', and anyway is a horrendously ugly solution.

I would use a different programming tool, such as awk.

grail · 04-30-2012, 12:50 PM

I have to agree with the others that a solution cannot really be presented as you have not provided enough detail.

To form a solution we (and you) need to know what represents the start and end of a ticket and if any of these things vary, then we will need to know all the variations.

cptsockpuppet · 04-30-2012, 01:26 PM

I'll try and add some detail here. The beginning of a ticket is indicated by the line "1 ticket(s) written to the database."^M" Every line following this is part of the ticket. The last line of the ticket is "- XXXX#814591^M".

I wasn't really expecting a simple solution, and given the helpful responses I've already gotten, I think I'm going to have to move towards awk or another tool.

Thank you again!

grail · 04-30-2012, 03:12 PM

Well I still would have liked some more examples, but assuming that multiple lines might look like:

Code:

1 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999997^M$
2 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999998^M$
3 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999999^M$

Then maybe this is what you are looking for:

Code:

$ awk 'BEGIN{RS = "[\r\n]+"}ORS=/^-/?"\n":" "' file
1 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999997
2 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999998
3 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999999

If you still require a windows based output, simply put "\r" prior to "\n"