LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-24-2009, 01:51 AM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 122

Rep: Reputation: 15
Question Script to isolate files with 1 instance of each of 4 diff. words (grep)


Hi all,

I am trying to write a script that searches though a directory of files with the names Moll_10000.fasta to Moll_16000.fasta, searches each file for the number of instances of the following strings: LGIG, HROB, NVEC, and CAP, and copies each file containing only one instance of each word to another folder. I know grep is at least in part the tool for the job and that the {1} modifier should be involved (right?). I can't figure out what to do, though. Here's what I've got so far (some of this is probably unnecessary):

Code:
grep -c LGIG\| *.fasta | cat > LGIG_count.txt
grep -c NVEC\| *.fasta | cat > NVEC_count.txt
grep -c HROB\| *.fasta | cat > HROB_count.txt
grep -c CAP\| *.fasta | cat > CAP_count.txt
grep \:1 LGIG_count.txt >> LGIG_single_copy_count.txt
grep \:1 NVEC_count.txt >> NVEC_single_copy_count.txt
grep \:1 HROB_count.txt >> HROB_single_copy_count.txt
grep \:1 CAP_count.txt >> CAP_single_copy_count.txt
Obviously I'd have to do the sorting manually the way I have it so far. Can anyone guide me in the right direction to automate this entire process?

Thanks!
Kevin
 
Old 08-24-2009, 02:42 AM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Put the following in a file and give the file execute permission (chmod 744 <file name>
Code:
#!/bin/bash
grep -c LGIG\| *.fasta | cat > LGIG_count.txt
grep -c NVEC\| *.fasta | cat > NVEC_count.txt
grep -c HROB\| *.fasta | cat > HROB_count.txt
grep -c CAP\| *.fasta | cat > CAP_count.txt
grep \:1 LGIG_count.txt >> LGIG_single_copy_count.txt
grep \:1 NVEC_count.txt >> NVEC_single_copy_count.txt
grep \:1 HROB_count.txt >> HROB_single_copy_count.txt
grep \:1 CAP_count.txt >> CAP_single_copy_count.txt
Then run it with ./<file name>
 
Old 08-24-2009, 09:14 AM   #3
berbae
Member
 
Registered: Jul 2005
Location: France
Distribution: Arch Linux
Posts: 540

Rep: Reputation: Disabled
From "The Linux cookbook: tips and techniques for everyday use" by Michael Stutz
12.2.3 Listing Only the Unique Words in Text

Presuming there is no punctuations in your text files, the tr command could be :

tr -s '[:blank:]' '\n' <text |sort|uniq

So for solving your specific problem, you can use :

Code:
for file in *.fasta; do
    nb=$(tr -s '[:blank:]' '\n' <$file |sort|uniq|grep -e LGIG -e HROB -e NVEC -e CAP|wc -l)
    if [ $nb = 4 ]; then
        cp $file <destination folder>
    fi
done
replace <destination folder> with the destination folder name without the '<' '>' characters.

Maybe some adjustments are to be made, but something along that line may solve your problem.

Bye.

Last edited by berbae; 08-24-2009 at 09:21 AM.
 
Old 08-24-2009, 09:54 AM   #4
berbae
Member
 
Registered: Jul 2005
Location: France
Distribution: Arch Linux
Posts: 540

Rep: Reputation: Disabled
An afterthought, if the *.fasta files are very large you can restrict the tr command to only the lines containing the searched strings :
Code:
for file in *.fasta; do
    grep -e LGIG -e HROB -e NVEC -e CAP $file >ifile
    zif=$(wc -l ifile)
    if [ $zif != 0 ]; then
        nb=$(tr -s '[:blank:]' '\n' <ifile |sort|uniq|grep -e LGIG -e HROB -e NVEC -e CAP|wc -l)
        if [ $nb = 4 ]; then
            cp $file <destination folder>
        fi
    fi
done
This seems better for large files.

Last edited by berbae; 08-24-2009 at 09:58 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Script to grep log files based on system date cyclegar Programming 2 05-05-2009 07:08 PM
Grep an entire file but must contain multiple words wakeboarder3780 Linux - Newbie 10 02-19-2009 05:46 PM
Grep words and paste is on the same line say_hi_ravi Programming 7 10-31-2008 07:56 AM
Can grep filter out words? extrasolar Linux - General 1 07-20-2006 04:14 PM
Using grep to find only first instance roballen Linux - General 2 01-29-2004 04:21 AM


All times are GMT -5. The time now is 11:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration