LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-21-2010, 01:38 AM   #1
zeratul111
LQ Newbie
 
Registered: Sep 2010
Posts: 19

Rep: Reputation: Disabled
Question A question about shell scripting: getting multiple input files for an application


Hi everyone,

I'm new to linux and shell scripting and am having particular trouble figuring out how exactly to approach/script the following that I wish to do. Any help would be much, much appreciated.

I am running an application called QuantiSNP (http://groups.google.co.uk/group/qua...uantisnp-usage). The binary file is "quantisnp2", called upon in the "run_quantisnp2.sh" supplied by the authors. I am only able to run the application for single file processing (e.g. 1 input file for 1 sample; I can't run the batch file processing because I don't have the necessary BeadStudio report files, which has different formatting for the input files).

The difficulty is that I have 300 samples (300 unique sample IDs) and 3 input files for each sample for a total of 900 runs of this application. I am wondering how would I be able to automate this process as a shell script instead of basically manually changing the sample ID and respective input files every time a run completes? I bolded the single file processing shell script command line options below that need to be changed for each sample/input single file processing run. The command line option "--sampleid" is for the name given to the processed output files corresponding to the sample of interest and there are 3 input files for each sample.

/home2/jason/QuantiSNP/quantisnp/linux64/run_quantisnp2.sh /home2/jason/QuantiSNP/MCR/v79/ --config /home2/jason/QuantiSNP/quantisnp/config/params.dat --levels /home2/jason/QuantiSNP/quantisnp/config/levels-affy.dat --outdir /home2/jason/QuantiSNP/quantisnp_out/ --sampleid sample1 --gender female --input-files /home2/jason/files/sample1_input.txt


-------------

Note that each sample has 3 input files, for a total of 3 runs of "quantisnp2" for each sample.

e.g.
SAMPLEID INFILE
sample1 /home2/jason/files/sample1_input.txt
sample1 /home2/jason/files/sample1_input2.txt
sample1 /home2/jason/files/sample1_input3.txt
sample2 /home2/jason/files/sample2_input.txt
sample2 /home2/jason/files/sample2_input2.txt
sample2 /home2/jason/files/sample2_input3.txt
...etc.

-------------

Thanks again! Please feel free to let me know if anything I wrote above needs clarification.
 
Old 09-21-2010, 01:48 AM   #2
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 101Reputation: 101
Code:
While read line; do run_quantisnp2.sh ... -sampleid `echo $line | awk '{ print $1 }'` --input-files `echo $line | awk '{ print $2 }'`; done < SAMPLEID.INFILE

Last edited by quanta; 09-21-2010 at 01:57 AM.
 
Old 09-21-2010, 01:50 AM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Well, the actual iteration can be done with a simple loop. The big question is making sure you're using the right files in the loop at the right time.

Could you break it down in just a bit more detail? What is the exact sequence of files that need to be processed? Is there any variability in the filenames or locations? Do the names always correspond to each other?

Finally, please use [code][/code] tags around the contents of scripts and text files, to preserve formatting and improve readability.
 
Old 09-21-2010, 01:56 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Assuming all the files are int he same directory (ie /home2/jason/files/):
Code:
#!/bin/bash

for file in /home2/jason/files/*
do
    id_name=${file%_*}
    /home2/jason/QuantiSNP/quantisnp/linux64/run_quantisnp2.sh /home2/jason/QuantiSNP/MCR/v79/ --config /home2/jason/QuantiSNP/quantisnp/config/params.dat --levels \
    /home2/jason/QuantiSNP/quantisnp/config/levels-affy.dat --outdir /home2/jason/QuantiSNP/quantisnp_out/ --sampleid $id_name --gender female --input-files $file
done
This of course untested so i would copy 3 of your associated sample files into a temp directory for testing.
 
Old 09-21-2010, 02:05 AM   #5
zeratul111
LQ Newbie
 
Registered: Sep 2010
Posts: 19

Original Poster
Rep: Reputation: Disabled
Thanks quanta, David, and grail for your replies.

Code:
/home2/jason/QuantiSNP/quantisnp/linux64/run_quantisnp2.sh /home2/jason/QuantiSNP/MCR/v79/ --config /home2/jason/QuantiSNP/quantisnp/config/params.dat --levels /home2/jason/QuantiSNP/quantisnp/config/levels-affy.dat --outdir /home2/jason/QuantiSNP/quantisnp_out/ --sampleid sample1 --gender female --input-files /home2/jason/files/sample1_input.txt
For the above, the only variability from run-to-run is the --sampleid and --input-files options. --sampleid involves simply a name of which there are 300. The --input-files has the same locations for all the text files. The input text files vary in name, but do contain the sample ID within the name.

Because the text files names do not correspond exactly to the sample IDs, would it be better if I create a text file that lists the sample IDs and their corresponding input file and work this into the shell script somehow? (I hope this isn't too confusing)

Thanks! I will look at the codes you guys provided in more detail right now.



Quote:
Originally Posted by David the H. View Post
Well, the actual iteration can be done with a simple loop. The big question is making sure you're using the right files in the loop at the right time.

Could you break it down in just a bit more detail? What is the exact sequence of files that need to be processed? Is there any variability in the filenames or locations? Do the names always correspond to each other?

Finally, please use [code][/code] tags around the contents of scripts and text files, to preserve formatting and improve readability.
 
Old 09-21-2010, 05:42 PM   #6
zeratul111
LQ Newbie
 
Registered: Sep 2010
Posts: 19

Original Poster
Rep: Reputation: Disabled
Hello again,

I ran grail's code above. For some reason it is not outputting the files correctly (e.g. nothing in the folder defined in --outdir).

From the output log for a processed file:
Quote:
QuantiSNP: Single-file mode input found.
QuantiSNP: Processing file: /home2/jason/QuantiSNP/testinput/gw6.P4A10_SNP6_R2
QuantiSNP. Chr23 is the X chromosome
QuantiSNP. Reading data for chromosome: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
QuantiSNP. Using EM for parameter estimation. Chromosome: 1.
QuantiSNP. Using EM for parameter estimation. Chromosome: 21.
QuantiSNP. Using EM for parameter estimation. Chromosome: 22.
QuantiSNP. Using EM for parameter estimation. Chromosome: 23.
QuantiSNP. CNV Calling: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
QuantiSNP. Writing QC file: /home2/jason/QuantiSNP/testoutput///home2/jason/QuantiSNP/testinput/gw6.P4A10_SNP6.qc

However, when I just run a single file only (original commands), the output is the following:
Quote:
QuantiSNP: Single-file mode input found.QuantiSNP: Processing file: /home2/jason/QuantiSNP/gw6.P4A11_SNP6QuantiSNP. Chr23 is the X chromosomeQuantiSNP. Reading data for chromosome: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23QuantiSNP. Using EM for parameter estimation. Chromosome: 1.
QuantiSNP. Using EM for parameter estimation. Chromosome: 2.
QuantiSNP. Using EM for parameter estimation. Chromosome: 3.
QuantiSNP. Using EM for parameter estimation. Chromosome: 4.
QuantiSNP. Using EM for parameter estimation. Chromosome: 5.
QuantiSNP. Using EM for parameter estimation. Chromosome: 6.
QuantiSNP. Using EM for parameter estimation. Chromosome: 7.
QuantiSNP. Using EM for parameter estimation. Chromosome: 8.
QuantiSNP. Using EM for parameter estimation. Chromosome: 9.
QuantiSNP. Using EM for parameter estimation. Chromosome: 10.
QuantiSNP. Using EM for parameter estimation. Chromosome: 11.
QuantiSNP. Using EM for parameter estimation. Chromosome: 12.
QuantiSNP. Using EM for parameter estimation. Chromosome: 13.QuantiSNP. Using EM for parameter estimation. Chromosome: 14.
QuantiSNP. Using EM for parameter estimation. Chromosome: 15.
QuantiSNP. Using EM for parameter estimation. Chromosome: 16.
QuantiSNP. Using EM for parameter estimation. Chromosome: 17.
QuantiSNP. Using EM for parameter estimation. Chromosome: 18.
QuantiSNP. Using EM for parameter estimation. Chromosome: 19.
QuantiSNP. Using EM for parameter estimation. Chromosome: 20.
QuantiSNP. Using EM for parameter estimation. Chromosome: 21.QuantiSNP. Using EM for parameter estimation. Chromosome: 22.
QuantiSNP. Using EM for parameter estimation. Chromosome: 23.QuantiSNP. CNV Calling: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
QuantiSNP. Writing QC file: /home2/jason/QuantiSNP/quantisnp_out//P4A11_SNP6.qcQuantiSNP. Writing output to file: /home2/jason/QuantiSNP/quantisnp_out//P4A11_SNP6.cnvQuantiSNP. Writing genotypes to file: /home2/jason/QuantiSNP/quantisnp_out//P4A11_SNP6.gn
QuantiSNP. Done in 0.52 mins.
So, in summary, for each file processed it should write 3 output files and very fast to process. However, for the above run (using grail's code) I used 3 input files. Although from the output log it seems like all three input files were processed, the output files are not written and it took >30 minutes and still did not finish the processing (compared to <1 minute for the original single file run).

Help will be much appreciated. Thank you very much!

Last edited by zeratul111; 09-21-2010 at 06:01 PM.
 
Old 09-21-2010, 07:13 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
yeah my bad there
Forgot that the path would still be in front of filename.
Give this a whirl:
Code:
#!/bin/bash

for path_file in /home2/jason/files/*
do
    file=${path_file##*/}
    id_name=${file%_*}
    echo "/home2/jason/QuantiSNP/quantisnp/linux64/run_quantisnp2.sh /home2/jason/QuantiSNP/MCR/v79/ --config /home2/jason/QuantiSNP/quantisnp/config/params.dat --levels \
    /home2/jason/QuantiSNP/quantisnp/config/levels-affy.dat --outdir /home2/jason/QuantiSNP/quantisnp_out/ --sampleid $id_name --gender female --input-files $file"
done
This will initially only echo out the command which you need to check against the one you are issuing from the command line.
If it looks correct then remove the echo and the quotes.
 
Old 09-21-2010, 07:59 PM   #8
zeratul111
LQ Newbie
 
Registered: Sep 2010
Posts: 19

Original Poster
Rep: Reputation: Disabled
Hi grail, thanks for the code! It works perfectly now.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can DVDauthor tape multiple input files? rastiazul Linux - Software 1 05-14-2009 02:31 PM
Editing terminal input prior to passing it to shell / application otho Linux - Software 4 09-10-2008 02:18 PM
shell script having multiple grep statements-I want input file to be read only once mukta9003 Linux - Newbie 4 08-27-2008 12:58 AM
bourne shell scripting ..input .. 91change Programming 2 07-29-2008 07:39 AM
Shell scripting: How to write to multiple files? Micro420 Programming 14 05-19-2007 03:41 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:22 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration