LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Script to pull certain characters from a filename and use in a variable? (https://www.linuxquestions.org/questions/linux-newbie-8/script-to-pull-certain-characters-from-a-filename-and-use-in-a-variable-765321/)

kmkocot 10-29-2009 10:03 AM

Script to pull certain characters from a filename and use in a variable?
 
Hi All,

I have a script that runs another script in batch mode for a folder of files. The script needs the input file name and a 4-letter abbreviation for that file (the -taxon= flag).
Code:

for myfile in *.fas
do
myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done

The files are all named as such (Genus_species):
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis.fas

I would like the taxon flag to be the first letter of the genus and the first three letters of the species in all caps like this:
myscript.pl -sequence_file=Lottia_gigantea.fas -taxon=LGIG
myscript.pl -sequence_file=Aplysia_californica.fas -taxon=ACAL
myscript.pl -sequence_file=Mytilus_edulis -taxon=MEDU

Can anyone help me out? Do I need to use xargs for this?

Thanks!
Kevin

David the H. 10-29-2009 12:58 PM

Assuming I'm understanding correctly, and that there's always only one underscore separating the filename into two parts, I think this should work.
Code:

for myfile in *.fas
do
  taxon="$(echo $myfile|sed 's/\(.\).*_\(...\).*/\U\1\2/')"
  myscript.pl -sequence_file=$myfile -taxon=$taxon
done

This is assuming gnu sed. Other versions may not have uppercase flags. But there's more info about that here.

geek.ksa 10-29-2009 02:03 PM

Hi,

below is one way to do it, using awk: here's my test code:

Code:

[root@linuxr LQ]# cat file
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis
[root@linuxr LQ]# cat file | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}'
LGIG
ACAL
MEDU
[root@linuxr LQ]#

Now to use the awk in your scrip, add a variable before calling the script to hold the resulting taxon, i am assuming Korn shell:
Code:

TAXON=$(echo $myfile | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}')
and when calling the the other script:

Code:

-taxon=$TAXON
assuming: GNU AWK, Korn shell, myfile=word1_word2
Hope this helps

pixellany 10-29-2009 02:34 PM

Suggestion on style:
Code:

for myfile in *.fas; do
      myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done

The first line starts the loop, the last one ends it.

The SED answer from David looks like the way to go, but be sure you have defined the algorithm to include all possible file names.

@David; Nice tip on the "uppercase flag"!!

chrism01 10-29-2009 06:57 PM

Frankly, given the Perl prog gets the filename to process, why not do it in there?

kmkocot 10-29-2009 09:10 PM

Thanks! Analysis running... :)


All times are GMT -5. The time now is 10:06 AM.