LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 02-07-2013, 05:27 AM   #1
Rcucullatus
LQ Newbie
 
Registered: Feb 2013
Posts: 2

Rep: Reputation: Disabled
Help with directing awk output to variable


Hi everyone,

New guy here with a problem that will hopefully have an easy solution, but I just can't seem to manage.

So, I have a large list of files that I need to process using the same command line program, and I'm trying to write a small shell script to automate this. I wrote something that will read the input file name from a text file, and repeat the command for each of those files. So far so good. My problem though is with naming the output. Each file is named in the general format "lane_number_bla_bla_bla", and they are processed in pairs. So, there will be a "lane_1_bla_bla_bla_001" and "lane_1_bla_bla_bla_002" that need to combine into a single output file. For this, I'm trying to use awk to read the sample number from the .txt list of input files and parse it into the output file number. Here's the code I came up with (note that the echo statement before the command is there just for testing; it's removed when it comes to run the actual program; also this is not the actual command which is rather more complicated, but the principle still applies):


echo "Which input1 should I use?"
read text
input1=$text # Defines text file that includes filenames of mate 1

echo "Which input2 should I use?"
read text
input2=$text # Defines text file that includes filenames of mate 2


echo "How many lines?"
read text
n=$text # Defines how many lines should be read from filename text files

for i in $(seq 1 $n)
do
awkinput1=$(awk NR==$i $input1) # Defines text at line "i" in filename text file 1 as variable to replace in command line for mate 1
awkinput2=$(awk NR==$i $input2) # Defines text at line "i" in filename text file 2 as variable to replace in command line for mate 2
num=$(awk 'NR==$i{print $2}' FS="_" $input1) # Defines sample number from "i" in filename text file 1 as variable to replace in concatenated file names
lane=$(awk 'NR==$i{print $1}' FS="_" $input1) # Defines lane number from "i" in filename text file 1 as variable to replace in concatenated file names

echo "command $awkinput1.in > $awkinput1.out && command $awkinput2.in > $awkinput2.out && command cat $awkinput1.out $awkinput2.in > $num-$lane-CAT.out &" # Command line of interest

if (( $i % 10 == 0 )); then wait; fi # Limit to 10 concurrent subshells.
done


When I run this, both $awkinput fields get replaced properly in the comand line by the appropriate filename, but not the $num and $lane fields, which print nothing.

So, what am I doing wrong? I'm sure it's pretty simple, but I tried quite a lot of different ways to format the relevant awk command, and nothing seems to work. I'm doing this on a remote linux server using SSH protocol, if it makes a difference.

Thanks a lot!

Last edited by Rcucullatus; 02-07-2013 at 05:29 AM.
 
Old 02-07-2013, 10:21 AM   #2
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian Squeeze
Posts: 5,739

Rep: Reputation: 298Reputation: 298Reputation: 298
Code:
awkinput1=$(awk NR==$i $input1)
awkinput2=$(awk NR==$i $input2)
num=$(awk 'NR==$i{print $2}' FS="_" $input1)
lane=$(awk 'NR==$i{print $1}' FS="_" $input1)
I may be off the mark here, but doesn't the == represent equality and the = represent value assignment? If I am correct, then replacing == with = to assign value to a variable should fix the problem.

That's the way it seems to work in examples I've looked up.
 
Old 02-07-2013, 10:22 AM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,386

Rep: Reputation: 808Reputation: 808Reputation: 808Reputation: 808Reputation: 808Reputation: 808Reputation: 808
Code:
num=$(awk 'NR==$i{print $2}' FS="_" $input1)
You are trying to reference the shell variable i from within awk, but awk doesn't see shell variables. Note that the previous awk commands didn't have quotes around the awk program so the value of i was substituted by the shell into the awk program first.

Instead of having awk reading the whole input file multiple times to get a single line each time, I would suggest reading the files sequentially from the shell:
Code:
line=0
while true ; do
    # read next line from $input1 and $input2
    read in1 <&3 || break
    read in2 <&4 || break
    ((line++))

    # get lane and sample number from input1
    IFS=_ read lane num restofline <<<"$in1"

    echo "command $in1.in > $in1.out && command $in2.in > $in2.out && command cat $in1.out $in2.out > $num-$lane-CAT.out &" # Command line of interest
    # or maybe just?
    echo "command $in1.in > $num-$lane-CAT.out && command $in2.in >> $num-$lane-CAT.out &" 

    # Limit to 10 concurrent subshells.
    if (( $line % 10 == 0 )) ; then wait ; fi

    # finished $n lines
    if (( $line >= $n )) ; then break ; fi
done 3< "$input1" 4< "$input2"
 
Old 02-07-2013, 05:57 PM   #4
Rcucullatus
LQ Newbie
 
Registered: Feb 2013
Posts: 2

Original Poster
Rep: Reputation: Disabled
Thank you both for the quick replies. ntubski got what the problem was, the first single quote was placed in such a way that included the NR defining expression. When I took that out of the quote, it worked just fine. It had to be something silly like that, but the devil is always in the details. I was quite interested in the scipt you suggested as well. My reckoning was that awk was more flexible in taking specific fields from composite names like the ones in my files, but maybe reading them from the shell will be faster and more efficient. I'll definitely give it a go.

Once again, thanks a lot for your help, you guys just saved me an awful lot of time and helped me understand shell scripting a bit better!
 
Old 02-09-2013, 10:24 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946Reputation: 1946
Please use ***[code][/code]*** tags around your code and data, to preserve the original formatting and to improve readability. Do not use quote tags, bolding, colors, "start/end" lines, or other creative techniques.

You can avoid the external file if you used an array instead.

Code:
files=( '' lane_1_bla_bla_bla_* )

for (( i=1 ; i<=${#files[@]} ; i+=2 )); do

    printf -v outfile 'lane_1_bla_bla_bla_output_%03d.txt' "$i"   #zero-pads the output number
    cat "${files[i]}" "${files[i+1]}" > "$outfile"

done
Note the use of a blank array entry at the beginning, so that the initial index 0 is ignored, allowing the rest to match up. You can also add sanity checks to ensure that you're matching up the correct files.


For reference, a textfile solution could be easier if the file contained two names per line.

Assuming that none of the filenames contains whitespace:

Code:
i=1
while read -r fname1 fname2; do

    printf -v newfile 'outputfile%03d.txt' $(( i++ ))
    cat "$fname1" ""$fname2" > "$newfile"

done <inputfile.txt
If the names can contain spaces, then you'd have to use a different delimiter. To read a colon-delimited list, for example, just change the first line to this:

Code:
while IFS=':' read -r fname1 fname2 ; do
How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
http://mywiki.wooledge.org/BashFAQ/001
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] taking awk output into a variable SAbhi Programming 15 11-01-2012 02:50 AM
[SOLVED] Printing foremated output unto a variable using AWK Regnets1 Programming 3 03-01-2012 11:32 AM
[SOLVED] Bash; awk or sed output to variable: how keep newline at end of each output line porphyry5 Programming 3 06-10-2011 05:50 PM
passing awk output to $variable for shell script? NewnanNOC Programming 3 10-23-2008 02:08 PM
how to pipe/redirect awk output into a variable? johnpaulodonnell Linux - Newbie 2 01-25-2007 06:54 AM


All times are GMT -5. The time now is 01:38 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration