LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-21-2018, 03:55 AM   #1
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 19

Rep: Reputation: Disabled
awk command to direct the output to multiple files


Hi all,
I have an input file containing the names of the files on which the processing needs to be done on certain columns. My input file looks like this

Code:
FID       IID  PHENO    CNT   CNT2    SCORE
  00010   0001002      2     28      9 -0.00843036
  00017   0001702      1     28      9 0.00710286
  00028   0002801      2     28      9 -0.00125893
I want to split the file on the basis of the 3rd column i.e., if there is "1" I need to output only the "score" 6th column into the separate file having the extension .control. If the 3rd column has the value "2" I need to output the 6th column into the file having the extension .case. Afterwards I need to run the R function on the case and control files. My code is

Code:
IFS=$'\n'
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt);
do awk '{ if ($3 == 1) {print $6 >> $FileName.control} else {print $6 >> $FileName.case}}' $FileName; 
done;
The above command is giving the error

Quote:
awk: cmd. line:1: { if ($3 == 1) {print $6 >> $FileName.control} else {print $6 >> $FileName.case}}
awk: cmd. line:1: ^ syntax error
However if I run the syntax only to output the .control files i.e., having the value "1" in the 3rd column and output the respective score in a separate file, that is working.

Code:
IFS=$'\n'
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt);  
do awk '{ if ($3 == 1) { print $6 } }' $FileName > $FileName.control;  done;
But I am unable to embed the else condition within the code. can anyone let me know whats going wrong? Thanks.
 
Old 11-21-2018, 04:15 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,849

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
what you mixed/missed is: Filename is evaluated by the shell, not by the awk. The awk script itself is: '{ if ($3 == 1) { print $6 } }' nothing more.

You cannot mix the two languages, the awk script cannot use $Filename as variable (which was defined in bash). If you want to do that you need to export it in shell and read that environment variable from awk (or pass this variable to the awk).
 
1 members found this post helpful.
Old 11-21-2018, 04:39 AM   #3
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Indeed, replace your code with the following:
Code:
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt); do cat "$FileName" | awk -v fn="$FileName" '{if ($3 == 1) {print $6 >>fn".control"} else {print $6 >>fn".case"}}'; done;

Last edited by l0f4r0; 11-21-2018 at 05:30 AM.
 
1 members found this post helpful.
Old 11-21-2018, 07:10 AM   #4
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
This can also be done without invoking 'awk'.
Code:
#!/bin/bash

myfile="Artery_Aorta-ListOfScoreFilesForScript"

while read -a aline; do	
  if ((	${aline[2]} == 1 )); then 
    echo "${aline[5]}" >> "$myfile".control
  elif (( ${aline[2]} == 2 )); then 
    echo "${aline[5]}" >> "$myfile".case
  fi 
done < "$myfile".txt
If you are using R, why not just read the entire file, make PHENO a factor and then subset as necessary?
 
2 members found this post helpful.
Old 11-21-2018, 07:54 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,849

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Quote:
Originally Posted by l0f4r0 View Post
Indeed, replace your code with the following:
Code:
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt); do cat "$FileName" | awk -v fn="$FileName" '{if ($3 == 1) {print $6 >>fn".control"} else {print $6 >>fn".case"}}'; done;
useless use of cat:
cat file | awk 'script' can be replaced by awk 'script' file
also for and cat together not suggested, instead while should be used. awk knows the file processed, do not need to set a variable
Code:
while read -r line; do
    awk '{if ($3 == 1) {print $6 >>FILENAME".control"} else {print $6 >>FILENAME".case"}}' $line
done < Artery_Aorta-ListOfScoreFilesForScript.txt
(not tested)
 
2 members found this post helpful.
Old 11-21-2018, 08:02 AM   #6
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
Quote:
awk '{if ($3 == 1) {print $6 >>FILENAME".control"} else {print $6 >>FILENAME".case"}}' $line
That fails on the header line in the input file.
 
1 members found this post helpful.
Old 11-21-2018, 08:21 AM   #7
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
Quote:
Originally Posted by allend View Post
That fails on the header line in the input file.
You mean "it works technically but the else part grabs the header line in files"?
If so, OP should prepend a sed or grep to his/her awk or add a condition in his/her else part...
 
1 members found this post helpful.
Old 11-21-2018, 08:53 AM   #8
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
Quote:
If so, OP should prepend a sed or grep to his/her awk ...
Perhaps just 'tail -n+2'?
 
1 members found this post helpful.
Old 11-21-2018, 09:09 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,849

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
awk can handle it, no need any external tool (for example post #4 has a solution). Also the variable NR can be used.
Code:
NR == 1 { next }
 
1 members found this post helpful.
Old 11-21-2018, 08:59 PM   #10
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 19

Original Poster
Rep: Reputation: Disabled
Thankyou all for the reply. The code work. I have added the else if condition is the code which works for the header line as well. But I am encountering another problem. I want to run some statistical test using R on the cases and control files. Now the variable $FileName is not being read by the R command. I think I am gain mixing the two languages i.e., bash and R. For the R t-test, shall I loop again to grab the controls and cases files?
 
Old 11-21-2018, 10:26 PM   #11
Aemm
LQ Newbie
 
Registered: Oct 2018
Posts: 19

Original Poster
Rep: Reputation: Disabled
The code which I am trying to run (on HPC) is

Code:
cd $PBS_O_WORKDIR
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt); 
do 
cat "$FileName" | awk -v fn="$FileName" '{if ($3 == 1) {print $6 >>fn".control"} else if ($3 == 2) {print $6 >>fn".case"}}'; 
Rscript t.test($FileName.control, $FileName.case, alternative = "two.sided", paired = FALSE, var.equal = FALSE)
done;

Last edited by Aemm; 11-21-2018 at 10:44 PM.
 
Old 11-22-2018, 12:27 AM   #12
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,793

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Quote:
Originally Posted by allend View Post
This can also be done without invoking 'awk'.
Code:
#!/bin/bash

myfile="Artery_Aorta-ListOfScoreFilesForScript"

while read -a aline; do	
  if ((	${aline[2]} == 1 )); then 
    echo "${aline[5]}" >> "$myfile".control
  elif (( ${aline[2]} == 2 )); then 
    echo "${aline[5]}" >> "$myfile".case
  fi 
done < "$myfile".txt
If you are using R, why not just read the entire file, make PHENO a factor and then subset as necessary?
Each >> is an open/append/close.
This is very I/O intensive; if NFS it would stress the NFS server.
The following opens/closes the file once, and even gives you the choice between > and >> (overwrite or append an existing file)
Code:
#!/bin/bash

myfile="Artery_Aorta-ListOfScoreFilesForScript"

while read -a aline; do	
  if ((	${aline[2]} == 1 )); then 
    echo "${aline[5]}"
  elif (( ${aline[2]} == 2 )); then 
    echo "${aline[5]}" >&3
  fi
done < "$myfile".txt > "$myfile".control 3> "$myfile".case
Note that the print in awk works like this, too:
the >> or > decides how the file is opened at the first write. Subsequent writes go to the stream i.e. append.
 
1 members found this post helpful.
Old 11-22-2018, 03:23 AM   #13
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,793

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
Quote:
Originally Posted by Aemm View Post
The code which I am trying to run (on HPC) is

Code:
cd $PBS_O_WORKDIR
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt); 
do 
cat "$FileName" | awk -v fn="$FileName" '{if ($3 == 1) {print $6 >>fn".control"} else if ($3 == 2) {print $6 >>fn".case"}}'; 
Rscript t.test($FileName.control, $FileName.case, alternative = "two.sided", paired = FALSE, var.equal = FALSE)
done;
It looks okay. I would avoid the redundant construction of file names. And no UUOC of course!
Code:
cd $PBS_O_WORKDIR || exit
for FileName in $(cat Artery_Aorta-ListOfScoreFilesForScript.txt)
do
  fn1=$FileName.control fn2=$FileName.case
  awk -v fn1="$fn1" -v fn2="$fn2" '{if ($3 == 1) {print $6 >fn1} else if ($3 == 2) {print $6 >fn2}}' $FileName
  Rscript t.test("$fn1", "$fn2", alternative = "two.sided", paired = FALSE, var.equal = FALSE)
done
As I said before, in awk's print you can use > or >>
The difference is how to open an existing file at the first write.

The ( ) are interpreted by the shell. This might still cause an error. But I don't know yet what Rscript is.

Last edited by MadeInGermany; 11-22-2018 at 03:31 AM.
 
  


Reply

Tags
awk regex, bash scripting, linux



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed inside awk or awk inside awk maddyfreaks Linux - Newbie 4 06-29-2016 01:10 PM
[SOLVED] Replacing multiple string in multiple files with awk jnorbert Linux - Newbie 9 03-26-2013 12:39 PM
[SOLVED] Once again... awk.. awk... awk shivaa Linux - Newbie 13 12-31-2012 04:56 AM
[SOLVED] Pass search results to awk, and use awk output to search other files bspears1 Linux - Newbie 8 07-21-2012 09:17 AM
Best way to ALOT of files in a direct to other direct? packman Linux - General 2 10-21-2002 07:31 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:01 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration