LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 03-13-2014, 08:27 AM   #1
jahndavik
LQ Newbie
 
Registered: Mar 2014
Posts: 2

Rep: Reputation: Disabled
Parsing folders picking files and concatenating them


Hi there,
I am writing a .sh in order to parse folders picking files with file names containing specific signatures (=part of the file name). The code I have written so far is as follows:

Code:
for i in $(ls); do                              
  #echo item: $i                                
  if [ -d $i ]; then                            
     cd $i   
     echo folder:$i                                   
     for z in *.fastq; do                          
       #echo item: $z	
       n=0
       if echo "$z" | grep -q "R1_00.";then	
          echo $z
          < here i want to append the current file to the previous file>
          < with the same signature (=R1_00. >
       fi
     done                                       
    cd ..                                        
  fi                                             
done
When I run this code, one of the folders comes up with the following print:

folder:SonNot24
31_GCCAAT_L005_R1_001.fastq
31_GCCAAT_L006_R1_001.fastq
33_ACAGTG_L003_R1_001.fastq
33_ACAGTG_L004_R1_001.fastq
35_TGACCA_L001_R1_001.fastq
35_TGACCA_L002_R1_001.fastq


Ultimately I would like to cat all these files into one with the name of the folder (SonNot24).

Any help is appreciated.

Thanks.
jahn

Last edited by jahndavik; 03-21-2014 at 05:23 AM.
 
Old 03-13-2014, 09:50 AM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,165

Rep: Reputation: 306Reputation: 306Reputation: 306Reputation: 306
Quote:
Originally Posted by jahndavik View Post
... I am writing a .sh in order to parse folders picking files with file names containing specific signatures (=part of the file name).
Maybe you need nothing more than the cat command. Just off the top of my head, I entered this on the command line:
Code:
 cat /home/daniel/Desktop/LQfiles/*m1*.bin >/home/daniel/Desktop/LQfiles/hugefile.bin
It took all files in the folder /home/daniel/Desktop/LQfiles/ with names which met certain criteria, catenated them into one new file called /home/daniel/Desktop/LQfiles/hugefile.bin. The selection criteria were these: the name contained the character string m1, and the file extension was .bin.

Daniel B. Martin

Last edited by danielbmartin; 03-13-2014 at 10:34 AM. Reason: Cosmetic improvement
 
2 members found this post helpful.
Old 03-13-2014, 09:51 PM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,693

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
So Daniel's suggestion is valid, so I will advise a little on your question and general coding:

1. Please use [code][/code] tags around code and data to maintain formatting

2. Do not use 'ls' to feed a for loop (or generally any type of loop), see here for more details

3. Although short and may not last long, if you try using meaningful variable names it can also assist with readability

4. Get in the practice of quoting all variables

5. grep is overkill in this scenario ... Check here and search for regex on the page

6. On regexes (regular expressions), '.' refers to any character, hence, "R1_00." from your code says to look for the string
"R1_00" followed by any single character. If you wanted the string followed by a period (.) you need to escape it
using either - "\." or "[.]"

Hope some of that helps
 
2 members found this post helpful.
Old 03-21-2014, 09:12 AM   #4
jahndavik
LQ Newbie
 
Registered: Mar 2014
Posts: 2

Original Poster
Rep: Reputation: Disabled
I never got that regexe to work. The grep -q thing works, though it may be overkill.

Re regex:
I look for the string 'R1.fastq.gz' in the file name and use:
Code:
if echo "$z" | grep -q "R1.fastq.gz";
I've tried
Code:
if [[ "$z" == R1"[.]"fastq"[.]"gz ]];
and
Code:
if [[ "$z" == R1[.]fastq[.]gz ]];
and
Code:
if [[ "$z" == R1"\."fastq"\."gz ]];
and
Code:
if [[ "$z" == R1\.fastq\.gz ]];
neither of them runs.

So, please, anyone :-)
Thanks.

jahn
 
Old 03-21-2014, 11:29 AM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,693

Rep: Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988Reputation: 1988
I found the trick to regex in bash is to assign it to a variable using full quoting ('') and then use the bare variable (one of those times when quotes do not help).
Code:
regex='R1[.]fastq[.]gz'

[[ "$z" =~ $regex ]] && echo we have a match
Pointers on your tests:

1. == - this is used to test if 2 strings are equal, which your tests clearly are not due to all the superfluous characters, ie the []

2. If you are going to test an entire string against another then you might as well use the tests you have but with the standard string:
Code:
[[ "$z" == "R1.fastq.gz" ]]
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Concatenating two files ingridsvensson Linux - Newbie 1 09-18-2012 06:32 AM
Concatenating two files without creating a newline between them? linux_newestbie Linux - General 8 03-12-2010 09:16 AM
Mencoder Concatenating Movie Files Feeona Linux - Software 6 10-05-2009 02:43 PM
concatenating files s_siouris Linux - Software 4 05-30-2005 10:36 AM
concatenating strings to open files veilig Programming 1 11-10-2003 06:36 PM


All times are GMT -5. The time now is 06:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration