LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-06-2012, 05:40 AM   #1
diegovillar
LQ Newbie
 
Registered: Nov 2012
Posts: 3

Rep: Reputation: Disabled
Rearranging logfile with awk


Hello,

I want to rearrange one logfile to process it on one analytics tool. For this I've using commands like this:

Code:
awk '{print $2,$1,$4,$7}' file.log
My problem is that some of theses logfields are enclosed into doble quottes and in these fields there are blank spaces and are being treatten like several fields instead one.

Does anyone know how to avoid this problem?

Thank you in advance
 
Old 11-06-2012, 05:50 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,687

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
you should show us some example: how your input lines look like, what causes the problem. Probably you only need to define a separator:
Code:
awk -F\  '{......}' logfile
#      ^^ two spaces
 
Old 11-06-2012, 06:06 AM   #3
devnull10
Member
 
Registered: Jan 2010
Location: Lancashire
Distribution: Slackware Stable
Posts: 572

Rep: Reputation: 120Reputation: 120
I think what the OP is saying, is that the log is something like:

Code:
1,Log,Blah,Blahs
2,"A Log","Blah","More Blah"
3,"Yes, A log",Etc,Blah
 
Old 11-06-2012, 06:38 AM   #4
diegovillar
LQ Newbie
 
Registered: Nov 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
The log is like this:

Quote:
2012-09-26 09:13:42.483 "79.100.26.27" - 172.18.6.61 admin=0&usuario=itencheva&pIdRag=367111&id_curso=1588&pIdioma=_esp&pIdiomaEntorno=_esp "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"
As you can see the default separator in the file is a blank space, the problem comes with the field enclose with double quotes.

Thank you for your anwsers
 
Old 11-06-2012, 08:25 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,687

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
in your case you can try something like this:
awk ' { $3 = $1; $1 = ""; $5 = ""; $6 = ""; print } '
or you can use perl, that will keep $7 in one piece.
 
Old 11-06-2012, 09:24 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Please identify what version of awk are you using?
Code:
awk --version
 
Old 11-06-2012, 09:28 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
In a more general case, you should try the following awk code:
Code:
BEGIN {

  FS = OFS = "\""
  
}

{
  
  for ( i = 2; i < NF; i += 2 )
      gsub(/ +/, "\033", $i)
      
  split($0, m, " ")
  
  for ( i = 1; i <= length(m); i++ )
      gsub("\033", " ", m[i])
  
  print m[2] " " m[1] " " m[4] " " m[7]
  
}
Basically it uses double quotes as field separator, and changes the spaces inside double quotes pairs with an hidden character (octal code 033). Then it splits the (new/modified) record based on the remaining blank spaces, that are the effective separators as per your requirement. Finally it changes the hidden characters back to blank spaces and prints out the desired fields.

Here is an example:
Code:
$ cat file
one "two two" three four "five five five" six "seven seven"
one "two two" three four "five five five" six "seven seven"
one "two two" three four "five five five" six "seven seven"
$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
"two two" one four "seven seven"
"two two" one four "seven seven"
"two two" one four "seven seven"
Using your sample:
Code:
$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
09:13:42.483 2012-09-26 - "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"
Hope this helps.
 
1 members found this post helpful.
Old 11-06-2012, 09:38 AM   #8
diegovillar
LQ Newbie
 
Registered: Nov 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
Thank you colucix, it helped me a lot, but it was not completely valid for mi purpose because the field enclosed with double quotes is no the last field that i need to print, I tried this command :

Code:
awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[1] " " m[2] " " m[5] " " m[12] " " m[8] " " m[6] " " m[16] " " m[4] " " m[3] " " m[7] " m[9] }'
But the system threw me the following error
Code:
^ unfinished string

Last edited by diegovillar; 11-06-2012 at 11:18 AM.
 
Old 11-06-2012, 11:06 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,687

Rep: Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274Reputation: 7274
probably this works for you:
Code:
perl -ne ' @a = split (/ /, $_, 7); print "$a[1] $a[0] $a[3] $a[6]" ' file.log
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell scripting: Print output to logfile, error to logfile & screen stefanlasiewski Programming 18 05-22-2008 12:47 PM
rearranging columns sureshbup Programming 3 11-01-2006 12:48 PM
Rearranging Partitions norain Linux - Software 3 08-02-2006 03:37 PM
awk: remove similar lines from logfile peos Programming 7 06-19-2006 07:13 AM
Rearranging partitions ernesto_cgf Linux - Software 3 06-12-2006 12:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration