LinuxQuestions.org - Rearranging logfile with awk

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Rearranging logfile with awk (https://www.linuxquestions.org/questions/programming-9/rearranging-logfile-with-awk-4175435876/)

Rearranging logfile with awk

Hello,

I want to rearrange one logfile to process it on one analytics tool. For this I've using commands like this:

Code:

awk '{print $2,$1,$4,$7}' file.log

My problem is that some of theses logfields are enclosed into doble quottes and in these fields there are blank spaces and are being treatten like several fields instead one.

Does anyone know how to avoid this problem?

Thank you in advance

you should show us some example: how your input lines look like, what causes the problem. Probably you only need to define a separator:

Code:

awk -F\  '{......}' logfile

#      ^^ two spaces

I think what the OP is saying, is that the log is something like:

Code:

1,Log,Blah,Blahs

2,"A Log","Blah","More Blah"

3,"Yes, A log",Etc,Blah

The log is like this:

Quote:

2012-09-26 09:13:42.483 "79.100.26.27" - 172.18.6.61 admin=0&usuario=itencheva&pIdRag=367111&id_curso=1588&pIdioma=_esp&pIdiomaEntorno=_esp "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"

As you can see the default separator in the file is a blank space, the problem comes with the field enclose with double quotes.

Thank you for your anwsers

in your case you can try something like this:
awk ' { $3 = $1; $1 = ""; $5 = ""; $6 = ""; print } '
or you can use perl, that will keep $7 in one piece.

Please identify what version of awk are you using?

Code:

awk --version

In a more general case, you should try the following awk code:

Code:

BEGIN {



  FS = OFS = "\""

  

}



{

  

  for ( i = 2; i < NF; i += 2 )

      gsub(/ +/, "\033", $i)

      

  split($0, m, " ")

  

  for ( i = 1; i <= length(m); i++ )

      gsub("\033", " ", m[i])

  

  print m[2] " " m[1] " " m[4] " " m[7]

  

}

Basically it uses double quotes as field separator, and changes the spaces inside double quotes pairs with an hidden character (octal code 033). Then it splits the (new/modified) record based on the remaining blank spaces, that are the effective separators as per your requirement. Finally it changes the hidden characters back to blank spaces and prints out the desired fields.

Here is an example:

Code:

$ cat file

one "two two" three four "five five five" six "seven seven"

one "two two" three four "five five five" six "seven seven"

one "two two" three four "five five five" six "seven seven"

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file

"two two" one four "seven seven"

"two two" one four "seven seven"

"two two" one four "seven seven"

Using your sample:

Code:

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file

09:13:42.483 2012-09-26 - "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"

Hope this helps.

Quote:

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file

Thank you colucix, it helped me a lot, but it was not completely valid for mi purpose because the field enclosed with double quotes is no the last field that i need to print, I tried this command :

Code:

awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[1] " " m[2] " " m[5] " " m[12] " " m[8] " " m[6] " " m[16] " " m[4] " " m[3] " " m[7] " m[9] }'

But the system threw me the following error

Code:

^ unfinished string

probably this works for you:

Code:

perl -ne ' @a = split (/ /, $_, 7); print "$a[1] $a[0] $a[3] $a[6]" ' file.log