[SOLVED] awk command to direct the output to multiple files
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
awk command to direct the output to multiple files
Hi all,
I have an input file containing the names of the files on which the processing needs to be done on certain columns. My input file looks like this
I want to split the file on the basis of the 3rd column i.e., if there is "1" I need to output only the "score" 6th column into the separate file having the extension .control. If the 3rd column has the value "2" I need to output the 6th column into the file having the extension .case. Afterwards I need to run the R function on the case and control files. My code is
Code:
IFS=$'\n'
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt);
do awk '{ if ($3 == 1) {print $6 >> $FileName.control} else {print $6 >> $FileName.case}}' $FileName;
done;
However if I run the syntax only to output the .control files i.e., having the value "1" in the 3rd column and output the respective score in a separate file, that is working.
Code:
IFS=$'\n'
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt);
do awk '{ if ($3 == 1) { print $6 } }' $FileName > $FileName.control; done;
But I am unable to embed the else condition within the code. can anyone let me know whats going wrong? Thanks.
what you mixed/missed is: Filename is evaluated by the shell, not by the awk. The awk script itself is: '{ if ($3 == 1) { print $6 } }' nothing more.
You cannot mix the two languages, the awk script cannot use $Filename as variable (which was defined in bash). If you want to do that you need to export it in shell and read that environment variable from awk (or pass this variable to the awk).
#!/bin/bash
myfile="Artery_Aorta-ListOfScoreFilesForScript"
while read -a aline; do
if (( ${aline[2]} == 1 )); then
echo "${aline[5]}" >> "$myfile".control
elif (( ${aline[2]} == 2 )); then
echo "${aline[5]}" >> "$myfile".case
fi
done < "$myfile".txt
If you are using R, why not just read the entire file, make PHENO a factor and then subset as necessary?
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt); do cat "$FileName" | awk -v fn="$FileName" '{if ($3 == 1) {print $6 >>fn".control"} else {print $6 >>fn".case"}}'; done;
useless use of cat:
cat file | awk 'script' can be replaced by awk 'script' file
also for and cat together not suggested, instead while should be used. awk knows the file processed, do not need to set a variable
You mean "it works technically but the else part grabs the header line in files"?
If so, OP should prepend a sed or grep to his/her awk or add a condition in his/her else part...
Thankyou all for the reply. The code work. I have added the else if condition is the code which works for the header line as well. But I am encountering another problem. I want to run some statistical test using R on the cases and control files. Now the variable $FileName is not being read by the R command. I think I am gain mixing the two languages i.e., bash and R. For the R t-test, shall I loop again to grab the controls and cases files?
#!/bin/bash
myfile="Artery_Aorta-ListOfScoreFilesForScript"
while read -a aline; do
if (( ${aline[2]} == 1 )); then
echo "${aline[5]}" >> "$myfile".control
elif (( ${aline[2]} == 2 )); then
echo "${aline[5]}" >> "$myfile".case
fi
done < "$myfile".txt
If you are using R, why not just read the entire file, make PHENO a factor and then subset as necessary?
Each >> is an open/append/close.
This is very I/O intensive; if NFS it would stress the NFS server.
The following opens/closes the file once, and even gives you the choice between > and >> (overwrite or append an existing file)
Code:
#!/bin/bash
myfile="Artery_Aorta-ListOfScoreFilesForScript"
while read -a aline; do
if (( ${aline[2]} == 1 )); then
echo "${aline[5]}"
elif (( ${aline[2]} == 2 )); then
echo "${aline[5]}" >&3
fi
done < "$myfile".txt > "$myfile".control 3> "$myfile".case
Note that the print in awk works like this, too:
the >> or > decides how the file is opened at the first write. Subsequent writes go to the stream i.e. append.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.