Does "wait" function exist in (g)awk?

lqd9o · 10-14-2012, 11:04 PM

It is a general question.

Using a awk or gawk script, let's say we have to process a large input file in two steps for example.

STEP 1: The first step processes every input records and add "1" in a new field at the end if the record contains a regex or "0" if not.

STEP 2: The second step processes each record differently depending on the last field we created just before:
If $NF = "1" then it does the operation A to the record,
If $NF = "0" then it does a different operation B to the record.

(This is a simple example of course)

Instead of taking input records one by one and applying STEP 1 and STEP 2 successively, is there a way to tell awk to apply the STEP 1 to all the input records, wait until they are all processed, and then apply the STEP 2?

Thanks for your help !

sag47 · 10-15-2012, 12:46 AM

As far as I know that would be using Awk incorrectly. You can treat awk like the stream editor (sed) in that it can only see one line at a time. That means if you have output from a command:

Code:

somecommand | awk 'TEST1 { RUN }; TEST2 { RUN }'

Then if the same line matches both TEST1 and TEST2 then both expressions will be run on the same line at the time the line is output. If you would like to have more serialized execution then you may have to consider either using a different language or running two awk commands like the following.

Code:

somecommand | awk 'TEST1 { RUN }' | awk 'TEST2 { RUN }'

Note my second "test" case has not actually been tested so it depends on how awk behaves. If it behaves like sed then my "test" case will likely not work either. However, if it behaves more like the sort command (processes the entire stream and then outputs at the end) then my "test" case will work. Try a couple of ways and maybe someone else could provide a better answer.

SAM

Thad E Ginataom · 10-15-2012, 02:04 AM

Quote:

Instead of taking input records one by one and applying STEP 1 and STEP 2 successively, is there a way to tell awk to apply the STEP 1 to all the input records, wait until they are all processed, and then apply the STEP 2?

I think this is one of those obvious-when-you-think-about-it answers: you output your first awk command to a temporary file, and run your second awk on the temporary file.

One has to ask why would you do this? I assume that that 1 or 0 is going to be used by something else: if it is just for this process, then it is an unnecessary stage.

If you must do this in two steps, within a single iteration of awk, I guess you could read the whole file into an array, and then process the array, possibly in an END section. But: keep it simple should be the whole philosophy of shell and awk work.

(awk has two special patterns: BEGIN and END. Whilst it works on a line by line basis, it executes stuff for the BEGIN pattern before it starts to read its input, and the END section after the line-by-line stuff is finished. You can use a BEGIN pattern in awk to write code that has no input: think awk "Hello World"!

By the way: NF is an awk built-in variable. If you try to use it for something else, I think confusion will ensue.

Trd300 · 10-15-2012, 04:18 AM

Thank you guys for your answers!

As I was thinking more about doing this from a single awk script, Thad E Ginataom's method might work better by reading the all file into an array or doing the second step in the END section maybe.