LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-30-2012, 10:29 AM   #1
cptsockpuppet
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Rep: Reputation: Disabled
sed variable multiline match


So I've been tasked with parsing a log file and converting into a nice single line log. The log file outputs information in the following format:

Code:
1 ticket(s) written to the database."^M
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M
- XXX#999999^M
The problem is that the number of lines can vary from ticket to ticket. Some tickets are 3 lines some are 6 lines. Is there some way to have SED match across those lines and strip out the new line characters for each ticket (Making it a nice single line entry)? Or should I move on to a different tool?

Thank you
 
Old 04-30-2012, 10:53 AM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by cptsockpuppet View Post
The problem is that the number of lines can vary from ticket to ticket. Some tickets are 3 lines some are 6 lines.
What characteristic of a line indicates "this is the first line of a multi-line ticket" versus "this is a continuation line of a multi-line ticket"?

Daniel B. Martin
 
Old 04-30-2012, 10:57 AM   #3
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
I would try awk here:
Code:
awk '
    /^[0-9]+ ticket\(s\) written/ {if (NR>1)print tmp; tmp = ""}
    {sub(/^M$/,""); tmp=tmp" "$0}
    END {print tmp}
' < filename
assuming ^M is a newline metacharacter and not ^ and M.

Last edited by millgates; 04-30-2012 at 10:59 AM. Reason: formatting for readability
 
Old 04-30-2012, 11:03 AM   #4
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
How do you know how long a ticket is? Is it just from "X tickets(s)" to the next time it's in the file? With sed, the best I can think of is this horrific thing:

Code:
sed 's/[[:digit:]]\+ ticket/@&/' YOUR_FILE | sed -e :a -e '$!N;s/\n[^@]/ /;ta' -e 'P;D' | sed 's/^@//'
This will find all lines which begin "_number_ ticket" and change it to "@_number_ ticket", then goes through the file thus produced, appending any line that doesn't begin with '@' to the previous line, then goes through again and removes all the '@'s. It's rather assuming that no line begins with '@', and anyway is a horrendously ugly solution.

I would use a different programming tool, such as awk.
 
Old 04-30-2012, 12:50 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I have to agree with the others that a solution cannot really be presented as you have not provided enough detail.

To form a solution we (and you) need to know what represents the start and end of a ticket and if any of these things vary, then we will need to know all the variations.
 
Old 04-30-2012, 01:26 PM   #6
cptsockpuppet
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
I'll try and add some detail here. The beginning of a ticket is indicated by the line "1 ticket(s) written to the database."^M" Every line following this is part of the ticket. The last line of the ticket is "- XXXX#814591^M".

I wasn't really expecting a simple solution, and given the helpful responses I've already gotten, I think I'm going to have to move towards awk or another tool.

Thank you again!
 
Old 04-30-2012, 03:12 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well I still would have liked some more examples, but assuming that multiple lines might look like:
Code:
1 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999997^M$
2 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999998^M$
3 ticket(s) written to the database."^M$
"Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s):^M$
- XXX#999999^M$
Then maybe this is what you are looking for:
Code:
$ awk 'BEGIN{RS = "[\r\n]+"}ORS=/^-/?"\n":" "' file
1 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999997
2 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999998
3 ticket(s) written to the database." "Information","53","04/11/2011 11:08:19 AM","DT Archiver","ARCHIVER3","None","N/A","List of ticket(s): - XXX#999999
If you still require a windows based output, simply put "\r" prior to "\n"
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiline sed quesion jax8 Linux - Software 2 03-31-2009 03:42 PM
LXer: Variable MultiLine Spacing With Sed On Linux Or Unix LXer Syndicated Linux News 0 07-07-2008 12:30 AM
sed and multiline delete aliyesami Programming 3 06-27-2008 02:03 PM
Perl Regex multiline match issues adymcc Linux - General 2 03-31-2008 09:45 AM
grep/sed/awk - find match, then match on next line gctaylor1 Programming 3 07-11-2007 08:55 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration