[SOLVED] awk

danielbmartin · 02-05-2012, 08:03 PM

I'm learning awk and wrote a simple test program. The desired operation is to read a file of words and output only those which...
- are five characters long
- have the same letter in position 2 and 5.
A sample input file...

Quote:

apple
orange
onion
banana
war
peace
surrender

The desired output file...

Quote:

onion
peace

I wrote this...

Code:

cat < $InFile \
|awk '{length($0)=5 && substr($0,2,1)==substr($0,5,1)}{print}' \
> Work04

... and get a syntax error. Please advise.

Daniel B. Martin

jhwilliams · 02-05-2012, 09:25 PM

Very close! How about this:

Code:

cat < $InFile | \
awk '{ if (5 == length($0) && substr($0,2,1) == substr($0,5,1)) print $0 }' \
> $OutFile

firstfire · 02-05-2012, 09:42 PM

Hi.

For learning purposes here is another solution

Code:

$ awk -F '' 'NF==5 && $2==$5' infile.txt 
onion
peace

-F '' -- set field separator to empty string, in which case awk treats each character as a field.
NF -- number of fields; in this case -- length of words.
$2, $5 -- second and 5th character in the string.

EDIT:
1) Note that if there are no action "{...}" after a pattern (or logical expression) then, by default, awk assumes '{print $0;}'

2) from info gawk:

Quote:

* Null strings are removed when they occur as part of a non-null
command-line argument, while explicit non-null objects are kept.
For example, to specify that the field separator `FS' should be
set to the null string, use:

awk -F "" 'PROGRAM' FILES # correct

Don't use this:

awk -F"" 'PROGRAM' FILES # wrong!

In the second case, `awk' will attempt to use the text of the
program as the value of `FS', and the first file name as the text
of the program! This results in syntax errors at best, and
confusing behavior at worst.

danielbmartin · 02-06-2012, 02:44 PM

Quote:

Originally Posted by jhwilliams

Very close! How about this:

Code:

cat < $InFile | \
awk '{ if (5 == length($0) && substr($0,2,1) == substr($0,5,1)) print $0 }' \
> $OutFile

Thank you for correcting my faulty code. Your solution is readable and I like it.

I made the next step by parameterizing this line.

Code:

# Parameterize the word length and "must match" character positions.
WL=6
p1=2
p2=5
cat < $InFile \
|awk '{ if (length($0)=='"$WL"'&&substr($0,'"$p1"',1)==substr($0,'"$p2"',1)) print $0 }' \
> $Work05

This works but the combination of single quotes and double quotes detracts from readability. Is there a cleaner way?

Daniel B. Martin

danielbmartin · 02-06-2012, 02:52 PM

Quote:

Originally Posted by firstfire

For learning purposes here is another solution

Code:

$ awk -F '' 'NF==5 && $2==$5' infile.txt 
onion
peace

-F '' -- set field separator to empty string, in which case awk treats each character as a field.
NF -- number of fields; in this case -- length of words.
$2, $5 -- second and 5th character in the string.

Thank you, firstfire, for this remarkably concise solution. Setting the field separator character to the null string is clever!

I made the next step by parameterizing this line.

Code:

# Parameterize the word length and "must match" character positions.
WL=6
p1=2
p2=5
cat < $InFile \
|awk -F '' 'NF=='"$WL"' && $'"$p1"'==$'"$p2"' ' \
> $Work07

This works but the combination of single quotes and double quotes detracts from readability. Is there a cleaner way?

Daniel B. Martin

firstfire · 02-06-2012, 03:11 PM

Hi.

Quote:

Originally Posted by danielbmartin

This works but the combination of single quotes and double quotes detracts from readability. Is there a cleaner way?

Yes, definitely.

Code:

$ awk -v WL=5 -v p1=2 -v p2=5 '{ if (length==WL && substr($0,p1,1)==substr($0,p2,1)) print}' infile.txt 
onion
peace

As you can see, one may pass variables to awk script using `-v' option.

I'd like to dissuade you from writing awk programs in C (as well as C++ programs in C etc)

IMHO, each language has it own preferable thought patterns, and you should learn them, not only the syntax. In case of awk, typically, you should think of input data as a sequence of records, each one consists of fields. Using this simple paradigm you can move mountains!

s4sandeep · 03-14-2012, 12:31 PM

Hi,

I have just started with unix and today started learning awk, this is my first post on this forum and do not know where to start a new thread so posting on this thrad.

Problem Details:
I tried to run the follwing command
ls -l | grep -v total | awk '{ print size is $5 bytes for $2 }' and it showed me this error:

syntax error The source line is 1.
The error context is
{ print size is $5 bytes >>> for <<< $2 }
awk: The statement cannot be correctly parsed.
The source line is 1.

while if I running this command : ls -l | grep -v total | awk '{ print size is $5 }' , it runs successfully with output as:
164
17146-rw-r--r-- 1 goyalank users 164 Mar 7 09:20 email
-rw-r--r-- 1 goyalank users 17146 Mar 7 13:49 task

To mention the detail the output of ls -l is :

-rw-r--r-- 1 goyalank users 164 Mar 7 09:20 email
-rw-r--r-- 1 goyalank users 17146 Mar 7 13:49 task

Any help is appreciable.

firstfire · 03-14-2012, 02:01 PM

Hi.

You should quote literal strings, like this:

Code:

ls -l | grep -v total | awk '{ print "size is "$5" bytes for "$2 }'

s4sandeep · 03-14-2012, 03:42 PM

Thanks a ton! That worked......
Can you also tell the step to create a new thead for a particular problem like the one I faced below....thanks again in advance.

firstfire · 03-14-2012, 10:56 PM

Quote:

Originally Posted by s4sandeep

Thanks a ton! That worked......
Can you also tell the step to create a new thead for a particular problem like the one I faced below....thanks again in advance.

Click on "Forum Tools" (next to "Search This Forum") -> "Post a New Thread".

linuxslayer69 · 12-02-2014, 09:40 PM

Quote:

Originally Posted by firstfire

Click on "Forum Tools" (next to "Search This Forum") -> "Post a New Thread".

Hello everyone I am currently in college taking a Linux course its been rough
although I was fine the first beginning 10 weeks now its getting really hard
i have to write two scripts one that tells me when my ip Changes and one that pulls out words out of a dictionary file from my professor and on that one i have to take out greater than or equal to 3 characters and less than or equal to 6 and put them in a empty file
i decided to use awk for the dictionary one and i cant figure out the right syntax ive been having trouble ive spent hours upon hours trying to figure these two out been two weeks now
anyways it would be great if someone can help me please
heres what i started out with
awk 'length 6=< && >=3 {printf "%d. %s\n"}' test > test1
cat test1
thanks guys

danielbmartin · 12-03-2014, 07:41 AM

Quote:

Originally Posted by linuxslayer69

... i cant figure out the right syntax ...

With this InFile ...

Code:

Honda
MG
Ford
Chevrolet
VW
Cadillac
Buick
MG
Chrysler
Kia
Lincoln
Volvo
Renault
Mazda
Fiat

... this awk ...

Code:

awk '{if (3<length($0) && length($0)<7) print}' $InFile >$OutFile

... produced this OutFile ...

Code:

Honda
Ford
Buick
Volvo
Mazda
Fiat

Daniel B. Martin

pan64 · 12-03-2014, 07:53 AM

do not use:
cat < file | awk 'something' > newfile
but
awk 'something' file > newfile

another comment: the usual syntax of awk is:
awk ' condition { action } ', so:

awk '3<length($0) && length($0)<7' $Infile > $OutFile
will probably work (the default action is print, you can omit that.

danielbmartin · 12-03-2014, 09:27 AM

I like this for readability (previously posted) ...

Code:

awk '{if (3<length($0) && length($0)<7) print}' $InFile >$OutFile

... but there are other ways to achieve the same objective, including ...

Code:

awk '{if (length($0)~/[3456]/) print}' $InFile >$OutFile

awk -F "" '{if ($3!="" && $7=="") print}' $InFile >$OutFile

Daniel B. Martin

pan64 · 12-03-2014, 10:44 AM

I agree with you, but the usual syntax of awk is what I wrote you:
awk ' condition { action } '
and using the usual syntax will not influence readability. Your original post:
awk '{length($0)=5 && substr($0,2,1)==substr($0,5,1)}{print}'
should work too, you only need to remove { } .