script to truncate lines containing a ">" character to 40 characters
Hi all,
I have a file that looks like this: Code:
>APOM|Contig4256_149_40_404 Desired output: Code:
>APOM|Contig4256_149_40_404 Here's what I've got so far: Code:
while read filename Thanks! Kevin |
sed'll do it, it's built for stuff like this. For my money save anything but the most trivial bash scripts for after you know how to use sed and awk, maybe before perl/python/whatnot and C. sed in particular, there are so many programs and scripts and (on Windows) commercial tools that get written because people don't know how much it can do, how conveniently. Yes, it kinda raises the bar on "quirky".
Code:
sed 's/^\(>.......................................\).*/\1/' <in >out |
bash string slicing
Code:
s1=">1234567890" |
jthill's command can be compressed a bit:
Code:
sed 's/^\(>.\{39\}\).*/\1/' <in >out And similar to chrism01's solution, if you are partial to python: Code:
#!/usr/bin/python |
In Perl I'd replace
Code:
s2="${s1:0:1}" Code:
if( $s1 =~ /^>/ ) Incidentally, bash also has a regex operator '=~' but I can't get it to use the '^' anchor char ie str must 'start' with '>'. Keep getting syntax error. Anyone know if it can be done? ie wanted to say Code:
if [[ $s1 =~ ^> ]] |
The syntax error is not from the caret--it's from the greater-than sign. I assume bash is trying to interpret it as output redirection.
I escaped it, and all was well: Code:
if [[ $s1 =~ ^\> ]] Quote:
|
For sed you can use -r to ignore the escaping:
Code:
sed -r 's/^(>.{39}).*/\1/' file Code:
awk '/>/{$0 = substr($0,1,40)}1' file Code:
ruby -ane '$F = $F.join[0,40] if />/; puts $F' file |
@Dark_Helmet: darn, could have sworn I tried that syntax ... anyway it works :)
I probably just escaped the caret instead ... |
Thanks all!
|
In the old single-bracket test, > and < are treated as redirections, and you have to first backslash escape them to \> and \<, at which point they become greater-than/less-than operators. The newer double-bracket test does not treat them as redirectors, so the unescaped values are greater-than/less-than, and escaping or quoting them makes them literal.
Note that these are string, not integer comparisons. http://mywiki.wooledge.org/BashFAQ/031 When using the regex operator in "[[", it's often better to store the expression in a separate variable first. Then you don't have to worry about escaping anything inside the test construct itself. Code:
re='^>' In this particular instance though, you can also use a simple glob, like so. Code:
if [[ $s1 == '>'* ]] |
All times are GMT -5. The time now is 03:16 PM. |