[SOLVED] weird awk behavior

schneidz · 08-27-2012, 10:12 AM

hi, i am trying to run this command on this sample input. i am also expecting the lines in red to be part of the output but for some reason they are not getting outputted:

Code:

[schneidz@hyper ~]$ cat test.lst
L  180 11000000   :     chun-li                            :     y 
L  180 11000000   :     chun-li                            :     n 
L  180 11000000   :     akuma                              :     y 
L  180 11000000   :     l33t                               :     y 
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     hello                              :     y 
L  180 11000000   :     world                              :     n 
L  180 11000000   :     chun-li                            :     n 
[schneidz@hyper ~]$ awk 'index($0,"n") == 66 {print $0}' test.lst
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     world                              :     n

i think its because there is an n in chun-li but that shouldnt matter since it is not the 66th byte in the record ?

thanks,

rknichols · 08-27-2012, 10:39 AM

Some of that spacing might have been done with tabs. You can't distinguise that on the screen, and it will affect the character count. See what this reveals:

Code:

tr '\t' '@' <test.lst

Relying on precise character positions in formatted output is often unreliable. Couldn't you just do this instead?

Code:

awk '$7 == "n" {print}' test.lst

firstfire · 08-27-2012, 10:41 AM

Hi.

Quote:

i think its because there is an n in chun-li

Apparently, yes:

Code:

$ echo banana|  awk '{print index($0,"n")}'
3

Quote:

but that shouldnt matter since it is not the 66th byte in the record ?

No. index() returns the index first (leftmost) occurrence of the string. For "chun-li" index() returns something like 28, so the condition is not met and the line is not printed.

Maybe try this:

Code:

$ awk '$7~/n/' in2 
L  180 11000000   :     chun-li                            :     n 
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     world                              :     n 
L  180 11000000   :     chun-li                            :     n

danielbmartin · 08-27-2012, 10:46 AM

Quote:

Originally Posted by schneidz

... for some reason they are not getting outputted ...

Your awk works correctly on my machine. Perhaps your actual data contains tab characters which look like blanks and that causes confusion. There might be a simpler way to extract the "n" lines. Is that character always the right-most character in each line? Or the right-most non-blank character?

Daniel B. Martin

schneidz · 08-27-2012, 10:58 AM

Quote:

Originally Posted by rknichols

Some of that spacing might have been done with tabs. You can't distinguise that on the screen, and it will affect the character count. See what this reveals:

Code:

tr '\t' '@' <test.lst

Code:

[schneidz@hyper ~]$ tr '\t' '@' <test.lst
L  180 11000000   :     chun-li                            :     y 
L  180 11000000   :     chun-li                            :     n 
L  180 11000000   :     akuma                              :     y 
L  180 11000000   :     l33t                               :     y 
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     hello                              :     y 
L  180 11000000   :     world                              :     n 
L  180 11000000   :     chun-li                            :     n

Quote:

Originally Posted by rknichols

Relying on precise character positions in formatted output is often unreliable. Couldn't you just do this instead?

Code:

awk '$7 == "n" {print}' test.lst

that would be ideal but some records would have chun li instead of chun-li.

firstfire · 08-27-2012, 10:59 AM

Quote:

Originally Posted by schneidz

thats would be ideal but some records would have chun li instead of chun-li.

Code:

$ awk -F ":"  '$3~/n/' in2 
L  180 11000000   :     chun-li                            :     n 
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     world                              :     n 
L  180 11000000   :     chun-li                            :     n

schneidz · 08-27-2012, 11:07 AM

Quote:

Originally Posted by danielbmartin

Your awk works correctly on my machine. Perhaps your actual data contains tab characters which look like blanks and that causes confusion. There might be a simpler way to extract the "n" lines. Is that character always the right-most character in each line? Or the right-most non-blank character?

Daniel B. Martin

Code:

[schneidz@hyper ~]$ awk --version
GNU Awk 3.1.8
Copyright (C) 1989, 1991-2010 Free Software Foundation.
...

yes this is from db2-sql output, i added : surrounding column-1 to help with post-processing so maybe i can added another string in the sql export, or use awk -F : '{print $3' or maybe n$...

thnks,

pan64 · 08-27-2012, 11:22 AM

or

Code:

awk '$NF == "n"' file

Quote:

Originally Posted by schneidz

i think its because there is an n in chun-li but that shouldnt matter since it is not the 66th byte in the record ?

index($0, "n") will return the position of the first n,
so index($0, "n") == 66 does not check the 66th char in the line (if it was an n), but the first n (if it was the 66th char)

schneidz · 08-27-2012, 01:38 PM

fyi, i can hack it into this:

Code:

[schneidz@hyper ~]$ awk 'index($0," n") == 65 {print $0}' test.lst
L  180 11000000   :     chun-li                            :     n 
L  180 11000000   :     h4x0rz                             :     n 
L  180 11000000   :     world                              :     n 
L  180 11000000   :     chun li                            :     n

but who knows when i hit a record that reads like chu n-li...

rknichols · 08-27-2012, 02:39 PM

The way to ask specifically about the 66th character is:

Code:

awk 'substr($0,66,1) == "n" {print}' test.lst

That extracts a string 1 character in length beginning at position 66 and compares it to "n".

David the H. · 08-27-2012, 03:52 PM

Quote:

Originally Posted by rknichols

See what this reveals:

Code:

tr '\t' '@' <test.lst

An easier way is to just use cat -A to view all non-printing characters. Any tab characters will be displayed with the caret notation "^I".

danielbmartin · 08-27-2012, 04:25 PM

Quote:

Originally Posted by rknichols

The way to ask specifically about the 66th character is:

Code:

awk 'substr($0,66,1) == "n" {print}' test.lst

That extracts a string 1 character in length beginning at position 66 and compares it to "n".

A variation on the same theme...

Code:

awk -F "" '$66~/n/' $InFile

Daniel B. Martin

schneidz · 08-28-2012, 07:39 AM

Quote:

Originally Posted by danielbmartin

A variation on the same theme...

Code:

awk -F "" '$66~/n/' $InFile

Daniel B. Martin

fyi, this worx on fedora but not on aix.