LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Rearranging logfile with awk (https://www.linuxquestions.org/questions/programming-9/rearranging-logfile-with-awk-4175435876/)

diegovillar 11-06-2012 05:40 AM

Rearranging logfile with awk
 
Hello,

I want to rearrange one logfile to process it on one analytics tool. For this I've using commands like this:

Code:

awk '{print $2,$1,$4,$7}' file.log
My problem is that some of theses logfields are enclosed into doble quottes and in these fields there are blank spaces and are being treatten like several fields instead one.

Does anyone know how to avoid this problem?

Thank you in advance

pan64 11-06-2012 05:50 AM

you should show us some example: how your input lines look like, what causes the problem. Probably you only need to define a separator:
Code:

awk -F\  '{......}' logfile
#      ^^ two spaces


devnull10 11-06-2012 06:06 AM

I think what the OP is saying, is that the log is something like:

Code:

1,Log,Blah,Blahs
2,"A Log","Blah","More Blah"
3,"Yes, A log",Etc,Blah


diegovillar 11-06-2012 06:38 AM

The log is like this:

Quote:

2012-09-26 09:13:42.483 "79.100.26.27" - 172.18.6.61 admin=0&usuario=itencheva&pIdRag=367111&id_curso=1588&pIdioma=_esp&pIdiomaEntorno=_esp "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"
As you can see the default separator in the file is a blank space, the problem comes with the field enclose with double quotes.

Thank you for your anwsers

pan64 11-06-2012 08:25 AM

in your case you can try something like this:
awk ' { $3 = $1; $1 = ""; $5 = ""; $6 = ""; print } '
or you can use perl, that will keep $7 in one piece.

grail 11-06-2012 09:24 AM

Please identify what version of awk are you using?
Code:

awk --version

colucix 11-06-2012 09:28 AM

In a more general case, you should try the following awk code:
Code:

BEGIN {

  FS = OFS = "\""
 
}

{
 
  for ( i = 2; i < NF; i += 2 )
      gsub(/ +/, "\033", $i)
     
  split($0, m, " ")
 
  for ( i = 1; i <= length(m); i++ )
      gsub("\033", " ", m[i])
 
  print m[2] " " m[1] " " m[4] " " m[7]
 
}

Basically it uses double quotes as field separator, and changes the spaces inside double quotes pairs with an hidden character (octal code 033). Then it splits the (new/modified) record based on the remaining blank spaces, that are the effective separators as per your requirement. Finally it changes the hidden characters back to blank spaces and prints out the desired fields.

Here is an example:
Code:

$ cat file
one "two two" three four "five five five" six "seven seven"
one "two two" three four "five five five" six "seven seven"
one "two two" three four "five five five" six "seven seven"

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
"two two" one four "seven seven"
"two two" one four "seven seven"
"two two" one four "seven seven"

Using your sample:
Code:

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
09:13:42.483 2012-09-26 - "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/4.0; GTB7.4; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C)"

Hope this helps.

diegovillar 11-06-2012 09:38 AM

Quote:

$ awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[2] " " m[1] " " m[4] " " m[7] }' file
Thank you colucix, it helped me a lot, but it was not completely valid for mi purpose because the field enclosed with double quotes is no the last field that i need to print, I tried this command :

Code:

awk 'BEGIN{ FS = OFS = "\"" }{ for ( i = 2; i < NF; i += 2 ) gsub(/ +/, "\033", $i); split($0, m, " "); for ( i = 1; i <= length(m); i++ ) gsub("\033", " ", m[i]); print m[1] " " m[2] " " m[5] " " m[12] " " m[8] " " m[6] " " m[16] " " m[4] " " m[3] " " m[7] " m[9] }'
But the system threw me the following error
Code:

^ unfinished string

pan64 11-06-2012 11:06 AM

probably this works for you:
Code:

perl -ne ' @a = split (/ /, $_, 7); print "$a[1] $a[0] $a[3] $a[6]" ' file.log


All times are GMT -5. The time now is 10:28 AM.