Hi all,
I have a directory with about 16,000 files with this format:
>LGIG|175428
MSIIIAQTPITYFGSDIQKSLGSLHGFRWAKYPGEKPLPGHNYTGPGISEDKLTALESKL
SDDSEIQKQIVAIQQQLINVVDKTQLQNLSSLISNLDDKITKQKKDLKQLIDNINPGISE
DKLQRELTKFTTELQKEIKNIDDSVIQQQITTINNEVLKQEKNIAALEKNLKEENKSYFN
LPFRNLRDENASISYNIDKSRESEYEKYGITANIIEFFRIQISISKPKAYLMVIVYHIYI
SYTGKIILHKDNIKEIKRSKVGKGTELLKKINIYTGRNCYIPTDGNCFIKCVNHVLNKDL
TNEFKNFIINFPKVNRKRVMTTARINEFNKKCETSFQIHTLKNRNLRPRDVKRELDWVLY
LHNSHFCLIRRNEKNLGIKEIEDNYEQVWKTCRDDNVVTQVSPLKLNVFSNMSDDT
>HROB|174996
MIVAHAPKTYFGSGDIQKSLGSLPGFPWAKYPGEKHLPGHNYTGRGTRLDLRLDENNKPK
PGEEPVNRVDAAALKHDILYRNKDIKFRHEADKQMIIELENIPNPTFKERMERALIIKLL
KAKMKLGTDCIDQMLQRLGKVDQKRLTLISHNGSGFDNWIALQNVKKLTQCPLVVDNKIL
SFPLSNPYTEERLQKKWKRQKEIMSNSNYLQNISFTCSFIHQSTSLAAWGNSSNLPMNLK
KITDVNIAKFTKETWESLRPE
In some of the files there are more or fewer sequences but the definition line always begins with a > symbol. The files are all named like "Moll_10000.fasta", "Moll_10001.fasta" and so on...
I am trying to write a script that reads the name of each file, strips out the number portion of the name ($NUMBER), and replaces all instances of ">" with ">$NUMBER|".
Here is what I tried (but didn't work). Can anyone point me in the right direction? Thanks!!!
Code:
COUNTER=10000
FILES=*.fasta
for i in $FILES
do
sed 's/>/>|$COUNTER/g'
COUNTER=COUNTER+1
done