LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   [Solved] awk in a loop (http://www.linuxquestions.org/questions/programming-9/%5Bsolved%5D-awk-in-a-loop-4175452122/)

wilelucho 02-28-2013 12:42 PM

[Solved] awk in a loop
 
Hi everyone,

I'm attempting to generate separate files coming from a single file
The source file has 112 rows and more than 1000 columns
What I want is to generate a file with the first, third, second columns and for each separated file I want to add the 4th column, the following file will have only the 5th line, and so on.
So, I tried with the following code
Code:

for i in {4..1000}
do
cat arquivototalt.txt | awk -F " " '{print $1,$3,$2,$i}' > ./arquivo_${i}.4c
done


colucix 02-28-2013 03:39 PM

Your loop is a little bit redundant, since the source file is parsed 997 times! Instead, let awk do the work in a single pass, e.g.
Code:

awk '{for (i = 4; i <= NF; i++) print $1, $3, $2, $i > sprintf("arquivo_%04d.4c",i)}' source_file
Hope this helps.

wilelucho 03-01-2013 07:11 AM

Solved
 
Thanks colucix, it worked like a charm

Luis

colucix 03-01-2013 07:26 AM

You're welcome! :)

grail 03-02-2013 05:36 AM

Please use the tools to mark your thread solved.

David the H. 03-02-2013 11:21 AM

For the record, the main problem with the OP loop is that the $i shell variable does not expand in the awk expression. Since the variable is hard-quoted, it won't expand, and awk ends up trying to use the non-existent awk variable 'i'.

In cases like this, you generally have to import shell variables into awk variables with the -v option.


I also imagine that the -F setting is probably unnecessary, since awk splits columns on all whitespace by default anyway. Unless of course the file contains tabs or other whitespace characters that you have to explicitly exclude from being treated as delimiters.


BTW, Useless Use Of Cat.


Finally, I highly recommend zero-padding the numbers in your filenames. It makes later processing much easier when the shell can sort them automatically.

Fortunately awk doesn't appear to be bothered by leading zeros in column expansion, so you can do the zero padding in the brace expansion (bash v4+).

Code:

for i in {0004..1000}; do
    awk -v "col=$i" '{print $1,$3,$2,$col}' arquivototalt.txt > "./arquivo_$i.4c"
done

If you're using an older version of bash, or another shell, you can pad the numbers with printf instead.

Code:

for i in {4..1000}; do
    awk -v "col=$i" '{print $1,$3,$2,$col}' arquivototalt.txt > "./arquivo_$(printf '%04d' "$i").4c"
done


colucix 03-02-2013 01:25 PM

David, your explanations are fully comprehensive and well phrased, as always. However, I would point out that running a single awk command, instead of 997 iterations is far more efficient.

David the H. 03-02-2013 11:56 PM

Oh, absolutely. I was taking that as a given since you had already addressed it. awk is definitely the proper choice here.

I just wanted to take the opportunity to address the shell code version as well, as a lesson in proper scripting technique.

Sorry if I didn't make that clear enough.

wilelucho 03-04-2013 05:02 AM

thanks
 
Thanks David the H. and colucix, this post turned unexpectedly a useful class about awk

Regards


All times are GMT -5. The time now is 08:34 PM.