LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 08-21-2009, 07:38 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Rep: Reputation: 15
Question Need help with script to replace certain text in file with part of the file's name


Hi all,

I have a directory with about 16,000 files with this format:

>LGIG|175428
MSIIIAQTPITYFGSDIQKSLGSLHGFRWAKYPGEKPLPGHNYTGPGISEDKLTALESKL
SDDSEIQKQIVAIQQQLINVVDKTQLQNLSSLISNLDDKITKQKKDLKQLIDNINPGISE
DKLQRELTKFTTELQKEIKNIDDSVIQQQITTINNEVLKQEKNIAALEKNLKEENKSYFN
LPFRNLRDENASISYNIDKSRESEYEKYGITANIIEFFRIQISISKPKAYLMVIVYHIYI
SYTGKIILHKDNIKEIKRSKVGKGTELLKKINIYTGRNCYIPTDGNCFIKCVNHVLNKDL
TNEFKNFIINFPKVNRKRVMTTARINEFNKKCETSFQIHTLKNRNLRPRDVKRELDWVLY
LHNSHFCLIRRNEKNLGIKEIEDNYEQVWKTCRDDNVVTQVSPLKLNVFSNMSDDT
>HROB|174996
MIVAHAPKTYFGSGDIQKSLGSLPGFPWAKYPGEKHLPGHNYTGRGTRLDLRLDENNKPK
PGEEPVNRVDAAALKHDILYRNKDIKFRHEADKQMIIELENIPNPTFKERMERALIIKLL
KAKMKLGTDCIDQMLQRLGKVDQKRLTLISHNGSGFDNWIALQNVKKLTQCPLVVDNKIL
SFPLSNPYTEERLQKKWKRQKEIMSNSNYLQNISFTCSFIHQSTSLAAWGNSSNLPMNLK
KITDVNIAKFTKETWESLRPE

In some of the files there are more or fewer sequences but the definition line always begins with a > symbol. The files are all named like "Moll_10000.fasta", "Moll_10001.fasta" and so on...

I am trying to write a script that reads the name of each file, strips out the number portion of the name ($NUMBER), and replaces all instances of ">" with ">$NUMBER|".

Here is what I tried (but didn't work). Can anyone point me in the right direction? Thanks!!!

Code:
COUNTER=10000
FILES=*.fasta

for i in $FILES
do
sed 's/>/>|$COUNTER/g'
COUNTER=COUNTER+1
done
 
Old 08-21-2009, 07:52 PM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
The sed command is incomplete. It should be something like:
Code:
sed -i "s/>/>${COUNTER}|/g" $i
the -i option (very dangerous without testing) edits the file in place, the file name is given as argument $i, double quotes are used to let the shell substitute the variable COUNTER with its actual value. Test it on some copies of the original files, before modifying them.

Edit: a more simple version for your script could be:
Code:
#!/bin/bash
for file in *.fasta
do
  #
  # extract the digits part from the file name
  #
  counter=$(echo $file | egrep -o [0-9]{5})
  #
  # edit the file
  #
  sed -i "s/>/>${counter}|/g" $file
done
again, test it before executing on the original files.

Last edited by colucix; 08-21-2009 at 08:05 PM.
 
Old 08-23-2009, 05:06 PM   #3
kmkocot
Member
 
Registered: Dec 2007
Location: Queensland, Australia
Posts: 112

Original Poster
Rep: Reputation: 15
Thanks! Your alternative is much more versatile. I really appreciate the help!

Kevin
 
  


Reply

Tags
grep, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
bash script to create text in a file or replace value of text if already exists knightto Linux - Newbie 5 09-11-2008 12:13 AM
Need a script to search and replace text in file using shell script unixlearner Programming 14 06-21-2007 11:37 PM
Help! Script or commanded needed to replace text in a file farmerjoe Programming 3 01-02-2005 06:59 PM
help! Script or command needed to replace text in a file. farmerjoe Linux - Newbie 2 01-02-2005 04:07 PM
Script to search and replace in text file - kinda... jeffreybluml Programming 45 11-07-2004 06:37 PM


All times are GMT -5. The time now is 06:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration