Help with directing awk output to variable
Hi everyone,
New guy here with a problem that will hopefully have an easy solution, but I just can't seem to manage. So, I have a large list of files that I need to process using the same command line program, and I'm trying to write a small shell script to automate this. I wrote something that will read the input file name from a text file, and repeat the command for each of those files. So far so good. My problem though is with naming the output. Each file is named in the general format "lane_number_bla_bla_bla", and they are processed in pairs. So, there will be a "lane_1_bla_bla_bla_001" and "lane_1_bla_bla_bla_002" that need to combine into a single output file. For this, I'm trying to use awk to read the sample number from the .txt list of input files and parse it into the output file number. Here's the code I came up with (note that the echo statement before the command is there just for testing; it's removed when it comes to run the actual program; also this is not the actual command which is rather more complicated, but the principle still applies): echo "Which input1 should I use?" read text input1=$text # Defines text file that includes filenames of mate 1 echo "Which input2 should I use?" read text input2=$text # Defines text file that includes filenames of mate 2 echo "How many lines?" read text n=$text # Defines how many lines should be read from filename text files for i in $(seq 1 $n) do awkinput1=$(awk NR==$i $input1) # Defines text at line "i" in filename text file 1 as variable to replace in command line for mate 1 awkinput2=$(awk NR==$i $input2) # Defines text at line "i" in filename text file 2 as variable to replace in command line for mate 2 num=$(awk 'NR==$i{print $2}' FS="_" $input1) # Defines sample number from "i" in filename text file 1 as variable to replace in concatenated file names lane=$(awk 'NR==$i{print $1}' FS="_" $input1) # Defines lane number from "i" in filename text file 1 as variable to replace in concatenated file names echo "command $awkinput1.in > $awkinput1.out && command $awkinput2.in > $awkinput2.out && command cat $awkinput1.out $awkinput2.in > $num-$lane-CAT.out &" # Command line of interest if (( $i % 10 == 0 )); then wait; fi # Limit to 10 concurrent subshells. done When I run this, both $awkinput fields get replaced properly in the comand line by the appropriate filename, but not the $num and $lane fields, which print nothing. So, what am I doing wrong? I'm sure it's pretty simple, but I tried quite a lot of different ways to format the relevant awk command, and nothing seems to work. I'm doing this on a remote linux server using SSH protocol, if it makes a difference. Thanks a lot! |
Code:
awkinput1=$(awk NR==$i $input1) That's the way it seems to work in examples I've looked up. |
Code:
num=$(awk 'NR==$i{print $2}' FS="_" $input1) Instead of having awk reading the whole input file multiple times to get a single line each time, I would suggest reading the files sequentially from the shell: Code:
line=0 |
Thank you both for the quick replies. ntubski got what the problem was, the first single quote was placed in such a way that included the NR defining expression. When I took that out of the quote, it worked just fine. It had to be something silly like that, but the devil is always in the details. I was quite interested in the scipt you suggested as well. My reckoning was that awk was more flexible in taking specific fields from composite names like the ones in my files, but maybe reading them from the shell will be faster and more efficient. I'll definitely give it a go.
Once again, thanks a lot for your help, you guys just saved me an awful lot of time and helped me understand shell scripting a bit better! :) |
Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.
You can avoid the external file if you used an array instead. Code:
files=( '' lane_1_bla_bla_bla_* ) For reference, a textfile solution could be easier if the file contained two names per line. Assuming that none of the filenames contains whitespace: Code:
i=1 Code:
while IFS=':' read -r fname1 fname2 ; do http://mywiki.wooledge.org/BashFAQ/001 |
All times are GMT -5. The time now is 03:07 PM. |