script to truncate lines containing a ">" character to 40 characters
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I tried writing a simple shell script but I can't quite get how to specify lines beginning with ">"
Here's what I've got so far:
Code:
while read filename
do
header=${^>} #This is the line I'm stuck on
if [ ${#header} -gt 40 ]
then
nfile=$(echo $header | cut -c1-40)
echo $nfile
else
echo $filename
fi
done < all.fasta > all.fasta.truncated
sed'll do it, it's built for stuff like this. For my money save anything but the most trivial bash scripts for after you know how to use sed and awk, maybe before perl/python/whatnot and C. sed in particular, there are so many programs and scripts and (on Windows) commercial tools that get written because people don't know how much it can do, how conveniently. Yes, it kinda raises the bar on "quirky".
Code:
sed 's/^\(>.......................................\).*/\1/' <in >out
Incidentally, bash also has a regex operator '=~' but I can't get it to use the '^' anchor char ie str must 'start' with '>'. Keep getting syntax error.
Anyone know if it can be done?
ie wanted to say
In the old single-bracket test, > and < are treated as redirections, and you have to first backslash escape them to \> and \<, at which point they become greater-than/less-than operators. The newer double-bracket test does not treat them as redirectors, so the unescaped values are greater-than/less-than, and escaping or quoting them makes them literal.
Note that these are string, not integer comparisons.
When using the regex operator in "[[", it's often better to store the expression in a separate variable first. Then you don't have to worry about escaping anything inside the test construct itself.
Code:
re='^>'
if [[ $s1 =~ $re ]]
Be sure not to quote the regex variable here, or else it will be treated as a string of literal characters.
In this particular instance though, you can also use a simple glob, like so.
Code:
if [[ $s1 == '>'* ]]
Notice again how you can quote/escape the character(s) that need to be matched literally, but be careful to leave the actual globbing character(s) unescaped.
Last edited by David the H.; 02-04-2012 at 06:05 AM.
Reason: fixet mestike
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.