-   Programming (
-   -   [Solved] awk in a loop (

wilelucho 02-28-2013 01:42 PM

[Solved] awk in a loop
Hi everyone,

I'm attempting to generate separate files coming from a single file
The source file has 112 rows and more than 1000 columns
What I want is to generate a file with the first, third, second columns and for each separated file I want to add the 4th column, the following file will have only the 5th line, and so on.
So, I tried with the following code

for i in {4..1000}
cat arquivototalt.txt | awk -F " " '{print $1,$3,$2,$i}' > ./arquivo_${i}.4c

colucix 02-28-2013 04:39 PM

Your loop is a little bit redundant, since the source file is parsed 997 times! Instead, let awk do the work in a single pass, e.g.

awk '{for (i = 4; i <= NF; i++) print $1, $3, $2, $i > sprintf("arquivo_%04d.4c",i)}' source_file
Hope this helps.

wilelucho 03-01-2013 08:11 AM

Thanks colucix, it worked like a charm


colucix 03-01-2013 08:26 AM

You're welcome! :)

grail 03-02-2013 06:36 AM

Please use the tools to mark your thread solved.

David the H. 03-02-2013 12:21 PM

For the record, the main problem with the OP loop is that the $i shell variable does not expand in the awk expression. Since the variable is hard-quoted, it won't expand, and awk ends up trying to use the non-existent awk variable 'i'.

In cases like this, you generally have to import shell variables into awk variables with the -v option.

I also imagine that the -F setting is probably unnecessary, since awk splits columns on all whitespace by default anyway. Unless of course the file contains tabs or other whitespace characters that you have to explicitly exclude from being treated as delimiters.

BTW, Useless Use Of Cat.

Finally, I highly recommend zero-padding the numbers in your filenames. It makes later processing much easier when the shell can sort them automatically.

Fortunately awk doesn't appear to be bothered by leading zeros in column expansion, so you can do the zero padding in the brace expansion (bash v4+).


for i in {0004..1000}; do
    awk -v "col=$i" '{print $1,$3,$2,$col}' arquivototalt.txt > "./arquivo_$i.4c"

If you're using an older version of bash, or another shell, you can pad the numbers with printf instead.


for i in {4..1000}; do
    awk -v "col=$i" '{print $1,$3,$2,$col}' arquivototalt.txt > "./arquivo_$(printf '%04d' "$i").4c"

colucix 03-02-2013 02:25 PM

David, your explanations are fully comprehensive and well phrased, as always. However, I would point out that running a single awk command, instead of 997 iterations is far more efficient.

David the H. 03-03-2013 12:56 AM

Oh, absolutely. I was taking that as a given since you had already addressed it. awk is definitely the proper choice here.

I just wanted to take the opportunity to address the shell code version as well, as a lesson in proper scripting technique.

Sorry if I didn't make that clear enough.

wilelucho 03-04-2013 06:02 AM

Thanks David the H. and colucix, this post turned unexpectedly a useful class about awk


All times are GMT -5. The time now is 03:35 AM.