Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
10-29-2009, 10:03 AM
|
#1
|
|
Member
Registered: Dec 2007
Posts: 85
Rep:
|
Script to pull certain characters from a filename and use in a variable?
Hi All,
I have a script that runs another script in batch mode for a folder of files. The script needs the input file name and a 4-letter abbreviation for that file (the -taxon= flag).
Code:
for myfile in *.fas
do
myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done
The files are all named as such (Genus_species):
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis.fas
I would like the taxon flag to be the first letter of the genus and the first three letters of the species in all caps like this:
myscript.pl -sequence_file=Lottia_gigantea.fas -taxon=LGIG
myscript.pl -sequence_file=Aplysia_californica.fas -taxon=ACAL
myscript.pl -sequence_file=Mytilus_edulis -taxon=MEDU
Can anyone help me out? Do I need to use xargs for this?
Thanks!
Kevin
|
|
|
|
10-29-2009, 12:58 PM
|
#2
|
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,568
|
Assuming I'm understanding correctly, and that there's always only one underscore separating the filename into two parts, I think this should work.
Code:
for myfile in *.fas
do
taxon="$(echo $myfile|sed 's/\(.\).*_\(...\).*/\U\1\2/')"
myscript.pl -sequence_file=$myfile -taxon=$taxon
done
This is assuming gnu sed. Other versions may not have uppercase flags. But there's more info about that here.
Last edited by David the H.; 10-29-2009 at 01:02 PM.
|
|
|
|
10-29-2009, 02:03 PM
|
#3
|
|
Member
Registered: Jan 2009
Location: Dhahran, Saudi Arabia
Distribution: RHEL 5
Posts: 42
Rep:
|
Hi,
below is one way to do it, using awk: here's my test code:
Code:
[root@linuxr LQ]# cat file
Lottia_gigantea.fas
Aplysia_californica.fas
Mytilus_edulis
[root@linuxr LQ]# cat file | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}'
LGIG
ACAL
MEDU
[root@linuxr LQ]#
Now to use the awk in your scrip, add a variable before calling the script to hold the resulting taxon, i am assuming Korn shell:
Code:
TAXON=$(echo $myfile | awk '{split($0,arr,"_"); print toupper(substr(arr[1],1,1)) toupper(substr(arr[2],1,3))}')
and when calling the the other script:
assuming: GNU AWK, Korn shell, myfile=word1_word2
Hope this helps
Last edited by geek.ksa; 10-29-2009 at 02:04 PM.
Reason: more clarification
|
|
|
|
10-29-2009, 02:34 PM
|
#4
|
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,797
|
Suggestion on style:
Code:
for myfile in *.fas; do
myscript.pl -sequence_file=$myfile -taxon=**needs to be a 4-letter abbreviation**
done
The first line starts the loop, the last one ends it.
The SED answer from David looks like the way to go, but be sure you have defined the algorithm to include all possible file names.
@David; Nice tip on the "uppercase flag"!!
|
|
|
|
10-29-2009, 06:57 PM
|
#5
|
|
Guru
Registered: Aug 2004
Location: Brisbane
Distribution: Centos 6.4, Centos 5.9
Posts: 14,961
|
Frankly, given the Perl prog gets the filename to process, why not do it in there?
|
|
|
|
10-29-2009, 09:10 PM
|
#6
|
|
Member
Registered: Dec 2007
Posts: 85
Original Poster
Rep:
|
Thanks! Analysis running... 
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 08:27 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|