LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-27-2018, 11:28 PM   #1
ctambu
LQ Newbie
 
Registered: May 2018
Posts: 4

Rep: Reputation: Disabled
Forloop problem


Hello! I'm trying to make a for loop on my genome data. The problem goes like this: I have 10 samples with 8 files each, named as follow:

AE001_S01_L001_R1_001.fastq.gz
AE001_S01_L002_R1_001.fastq.gz
AE001_S01_L003_R1_001.fastq.gz
AE001_S01_L004_R1_001.fastq.gz
AE001_S01_L001_R2_001.fastq.gz
AE001_S01_L002_R2_001.fastq.gz
AE001_S01_L003_R2_001.fastq.gz
AE001_S01_L004_R2_001.fastq.gz

AE134_S03_L001_R1_001.fastq.gz
AE134_S03_L002_R1_001.fastq.gz
AE134_S03_L003_R1_001.fastq.gz
AE134_S03_L004_R1_001.fastq.gz
AE134_S03_L001_R2_001.fastq.gz
AE134_S03_L002_R2_001.fastq.gz
AE134_S03_L003_R2_001.fastq.gz
AE134_S03_L004_R2_001.fastq.gz

and go on...

I need to do a "cat" for R1 and R2 at each sample, so i create a file name "samples.txt" were each line is the name of one sample
ex.

AE001
AE134
.
.
.
(for my 10 samples)
and i tried to run the follow script:

for sample in `cat samples.txt`;
do
cat $sample*R1*fastq.gz > $sample_L001-4_R1_001.fastq.gz | mv $sample_L001-4_R1_001.fastq.gz ../1.FASTQC_1/
cat $sample*R2*fastq.gz > $sample_L001-4_R2_001.fastq.gz | mv $sample_L001-4_R2_001.fastq.gz ../1.FASTQC_1/;
done

I would expect to have 2 files (*_L001-4_R1_001.fastq.gz and *_L001-4_R2_001.fastq.gz) for each sample but it only generates 2 files por the 10 samples named as follow:

-4_R1_001.fastq.gz and -4_R2_001.fastq.gz

I would appreciate your help as i am new in this and its too early to get crazy already.

Thank you!!
 
Old 05-28-2018, 01:37 AM   #2
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
I'm not sure I entirely understand how you've structured your data, but I suspect ( if you are using BASH ) then as far as the script is concerned, you may want something more like this:


Code:
#!/bin/bash

for sample in `cat samples.txt`
    do 
        cat  ${sample}*R1*.fastq.gz  >  ${sample}_L001-4_R1_001.fastq.gz  |  mv 
 ${sample}_L001-4_R1_001.fastq.gz  ../1.FASTQC_1/
        cat  ${sample}*R2*.fastq.gz  >  ${sample}_L001-4_R2_001.fastq.gz  |  mv 
 ${sample}_L001-4_R2_001.fastq.gz  ../1.FASTQC_1/
    done
In which case I'm guessing an entry in samples.txt would look something like:
Code:
AE001_S01
But the main point is to delimit the variable name within the script, to avoid confusion.

Last edited by rigor; 05-28-2018 at 01:42 AM.
 
2 members found this post helpful.
Old 05-28-2018, 03:26 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
yes, you need to use {} around variable names: ${sample}
also you need to use ; instead of |

something like this:
Code:
cat ${sample}*R1*fastq.gz > ${sample}_L001-4_R1_001.fastq.gz ; mv ${sample}_L001-4_R1_001.fastq.gz ../1.FASTQC_1/
you ought to try www.shellcheck.net to check your script.
also you may insert set -xv at the beginning of your script to see what's happening
 
2 members found this post helpful.
Old 05-28-2018, 06:22 PM   #4
ctambu
LQ Newbie
 
Registered: May 2018
Posts: 4

Original Poster
Rep: Reputation: Disabled
Thank you very much @rigor and @pan64. I think that works. I didn`t understand the "insted of | " and why i need to use {}. I have done others loop in bash and the {} were not necessary.
 
Old 05-28-2018, 08:22 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,356

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
1. Re ';' instead of '|'

In your case, you are creating a new file, then want to move it, hence separate cmds with ';'
'|' is for 'piping' data from the o/p of one cmd to the i/p of another

2. Re {}
There are some slightly arcane rules involved in how bash 'finds' embedded variable names. The safest approach to avoid confusing it is to always surround the varname with {} if it is embedded (inc at start) of another string.
For consistency and to make it standout for yourself later/other devs, I usually also use {} even if the varname appears at the end of a string


HTH

Last edited by chrism01; 05-28-2018 at 08:23 PM.
 
1 members found this post helpful.
Old 05-28-2018, 08:49 PM   #6
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,776

Rep: Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212Reputation: 2212
Quote:
Originally Posted by ctambu View Post
I didn`t understand ... why i need to use {}.
The problem is that the underscore character is a legal character in a name, and the shell will take all characters up to the first "impossible" character as part of the name. So, "$sample_L001-4_R1_001.fastq.gz" is parsed as "${sample_L001}-4_R1_001.fastq.gz", and you don't have a variable named "sample_L001". If you don't want the "_L001" to be treated as part of the variable name, then you need to write that as "${sample}_L001-4_R1_001.fastq.gz".
 
1 members found this post helpful.
Old 05-28-2018, 09:48 PM   #7
ctambu
LQ Newbie
 
Registered: May 2018
Posts: 4

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by rknichols View Post
The problem is that the underscore character is a legal character in a name, and the shell will take all characters up to the first "impossible" character as part of the name. So, "$sample_L001-4_R1_001.fastq.gz" is parsed as "${sample_L001}-4_R1_001.fastq.gz", and you don't have a variable named "sample_L001". If you don't want the "_L001" to be treated as part of the variable name, then you need to write that as "${sample}_L001-4_R1_001.fastq.gz".
That`s exactly what was happening to me."_L001" was being treated as part of the variable name. Thank you for the tips., are very useful !!

Best regards
 
Old 05-28-2018, 09:51 PM   #8
ctambu
LQ Newbie
 
Registered: May 2018
Posts: 4

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by chrism01 View Post
1. Re ';' instead of '|'

In your case, you are creating a new file, then want to move it, hence separate cmds with ';'
'|' is for 'piping' data from the o/p of one cmd to the i/p of another

2. Re {}
There are some slightly arcane rules involved in how bash 'finds' embedded variable names. The safest approach to avoid confusing it is to always surround the varname with {} if it is embedded (inc at start) of another string.
For consistency and to make it standout for yourself later/other devs, I usually also use {} even if the varname appears at the end of a string


HTH
Ohh ok. Thank you. I understand. Thank you very much for the help, it was really useful!.

Best regards. I really appreciate the help. See you in a next problem!!

Last edited by ctambu; 05-28-2018 at 10:12 PM.
 
Old 05-28-2018, 09:59 PM   #9
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
Hi ctambu,

Whenever someone is asking question in the "newbie" forum, I am torn between correcting every little detail that might be wrong, versus only focusing on what needs to change, to get something working.

It might be fair to say there are about three "major" forms of "connecting commands" ( in BASH ) that come to mind, those being the following:
Code:
command_one  |  command_two
Code:
command_one  ||  command_two
Code:
command_one  &&  command_two
As has effectively already been mentioned, the first form takes the so called "standard" output from the first command "command_one" and delivers it as the so called "standard" input for "command_two". The other two forms execute the second command "command_two" conditional in a sense, based on the success or failure of "command_one". I thought perhaps you had read about the form involving the "||" operator, and merely neglected to include the second "|". Since in this case, I believe it just happens not to cause a problem, I didn't mention the distinction. However, for all the single "|" accomplishes in this particular case, you might was well replace the "|" character with the ";" character.

Last edited by rigor; 05-28-2018 at 10:01 PM.
 
1 members found this post helpful.
Old 05-29-2018, 12:35 AM   #10
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
almost...
| is a piping char, that means stdin redirection - as you wrote, but ; is just a simple command delimiter, nothing will be redirected and also no condition will be evaluated.
 
1 members found this post helpful.
Old 05-29-2018, 08:46 PM   #11
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
Quote:
Originally Posted by pan64 View Post
almost...
| is a piping char, that means stdin redirection - as you wrote, but ; is just a simple command delimiter, nothing will be redirected and also no condition will be evaluated.
pan64,

Perhaps you could enlighten me as to exactly what I wrote, which made you feel that I asserted anything different, from the distinction which you made. I did not suggest that a pipe character is a conditional operator. I merely indicated that I thought using a pipe when there was nothing to pipe, would not cause any problems, in a situation such as this. I actually tested that to be sure that my thought was accurate.

Or as Monty Python could have said, Ecce nullum data, ergo pipe. :-)

In any event, since I now see the original problem has been marked as solved, let's not confuse the issue.
 
Old 05-30-2018, 02:04 AM   #12
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
This statement is not valid.
Quote:
However, for all the single "|" accomplishes in this particular case, you might was well replace the "|" character with the ";" character.
The best I can say is: using pipe in this case may cause unpredictable result, and as a side effect if it occasionally will do what you wish then it is happening just because this case is not handled properly.
Code:
cat file1 > file2 
# will not and cannot send anything to stdout or pipe
# the command mv cannot accept anything from stdin (or pipe)
cat file1 > file2 | mv file2 dir
# you ought to use instead:
cat file1 > dir/file2
# or even better:
cp file1 dir/file2
with other words: instead of you might replace better to write you must replace
 
Old 05-30-2018, 11:33 AM   #13
rigor
Member
 
Registered: Sep 2003
Location: 19th moon ................. ................Planet Covid ................Another Galaxy;............. ................Not Yours
Posts: 705

Rep: Reputation: Disabled
I would suggest that since the particular problem for which this thread was started is considered solved by the OP, the issue that pan64 raises should be discussed in a separate thread which I started here: Does BASH handle piping properly when there is no data to pipe?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
UEFI problem, GRUB2 problem, blank screen problem, :( JackDinn Linux - Newbie 22 05-26-2015 01:57 PM
Do I have a path problem, an Apache2 problem or a Javascript problem or any other pro rblampain Linux - Networking 0 12-29-2010 03:50 AM
forloop with awk and id -a sysslack Programming 10 04-15-2009 06:48 AM
Make forloop restart before its finished Manana Linux - General 17 02-06-2007 04:58 AM
perl problem? apache problem? cgi problem? WorldBuilder Linux - Software 1 09-17-2003 07:45 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration