LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 10-29-2009, 10:03 AM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 98

Rep: Reputation: 15
Question Script to pull certain characters from a filename and use in a variable?


Hi All,

I have a script that runs another script in batch mode for a folder of files. The script needs the input file name and a 4-letter abbreviation for that file (the -taxon= flag).
Code:
for myfile in *.fas
do
myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done
The files are all named as such (Genus_species):
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis.fas

I would like the taxon flag to be the first letter of the genus and the first three letters of the species in all caps like this:
myscript.pl -sequence_file=Lottia_gigantea.fas -taxon=LGIG
myscript.pl -sequence_file=Aplysia_californica.fas -taxon=ACAL
myscript.pl -sequence_file=Mytilus_edulis -taxon=MEDU

Can anyone help me out? Do I need to use xargs for this?

Thanks!
Kevin
 
Old 10-29-2009, 12:58 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Assuming I'm understanding correctly, and that there's always only one underscore separating the filename into two parts, I think this should work.
Code:
for myfile in *.fas
do
  taxon="$(echo $myfile|sed 's/\(.\).*_\(...\).*/\U\1\2/')"
  myscript.pl -sequence_file=$myfile -taxon=$taxon
done
This is assuming gnu sed. Other versions may not have uppercase flags. But there's more info about that here.

Last edited by David the H.; 10-29-2009 at 01:02 PM.
 
Old 10-29-2009, 02:03 PM   #3
geek.ksa
Member
 
Registered: Jan 2009
Location: Dhahran, Saudi Arabia
Distribution: RHEL 5
Posts: 42

Rep: Reputation: 17
Hi,

below is one way to do it, using awk: here's my test code:

Code:
[root@linuxr LQ]# cat file
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis
[root@linuxr LQ]# cat file | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}'
LGIG
ACAL
MEDU
[root@linuxr LQ]#
Now to use the awk in your scrip, add a variable before calling the script to hold the resulting taxon, i am assuming Korn shell:
Code:
TAXON=$(echo $myfile | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}')
and when calling the the other script:

Code:
-taxon=$TAXON
assuming: GNU AWK, Korn shell, myfile=word1_word2
Hope this helps

Last edited by geek.ksa; 10-29-2009 at 02:04 PM. Reason: more clarification
 
Old 10-29-2009, 02:34 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Suggestion on style:
Code:
for myfile in *.fas; do
      myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done
The first line starts the loop, the last one ends it.

The SED answer from David looks like the way to go, but be sure you have defined the algorithm to include all possible file names.

@David; Nice tip on the "uppercase flag"!!
 
Old 10-29-2009, 06:57 PM   #5
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
Frankly, given the Perl prog gets the filename to process, why not do it in there?
 
Old 10-29-2009, 09:10 PM   #6
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 98

Original Poster
Rep: Reputation: 15
Thanks! Analysis running...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Read filename as a variable kicap Programming 3 08-12-2009 04:27 PM
Thunderbird doesn't open file attachments with certain characters in the filename make Linux - Software 0 07-31-2008 07:55 AM
Using a variable containing a filename in grep TrumpetMan258 Programming 2 03-01-2008 01:27 PM
multilanguage filename characters issue ovidnet Linux - Desktop 4 10-10-2007 03:10 PM
Trying to delete a filename with special characters Harry Seldon Linux - General 11 03-20-2007 01:31 PM


All times are GMT -5. The time now is 07:13 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration