LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-18-2016, 07:23 PM   #1
pitosh
LQ Newbie
 
Registered: Feb 2016
Posts: 2

Rep: Reputation: Disabled
length of sequence within a file


I have the following file and I wish to find the length of each sequence within thatfile :

i.e I want to find the length between >TCONS_00000066 and next >TCONS_00006042 ;

length between >TCONS_00006042 and >TCONS_00000065 and so on

My file has almost 50000 sequences.

Code:
>TCONS_00000066
CCGCCGGCTGCTGCGCGCACCGACTTGTCACCACCCCAGCACGTCCTCCACGTATACAAG
CGCTACGGTCCACCGCGGCAGCGTCGACGTCCTTGTCCGCAAACATGGTGGTGGCAGCTT
CCTCATCGAGCAGCAGCAACTCATCCTCGAGGGGAAGGGCCCAGAGCTTCTAATCCTACA
>TCONS_00006042
GCCACTAGCCAGCCCAGCCAGGGGAAGGGGAGGAGCTGCAAGCCCAACCCCCTGCTCAAC
CCTAAATTGCTTCCGCCGATCGGTGAGAGCTCCGATGCCTTCTTCTTCTTCTTCTTCCTC
CCCCTCTACCTGTTCCTTCTCCGAGATAACTGCAACATTTTCAGCACTTTTTCTGGCCAT
CATTTGAAGACTCGCTCAGATTTGTCAAGAAAGTGAAGGCTTGTAACTACATGTTGTATT
>TCONS_00000065
TCTCAAGTCCCCAGCCCAGGGACTAGAGTGTTACTATGGCTAGAGCAAATGAGATGGTCA
GGGCAGACTCAAGGATGATGGTTGTCTTTAGTGCCCTGGCATCTAAATCAGGGCCACTGA
Is there a way I could print the length of each sequence
Code:
>TCONS_00000066
43
>TCONS_00006042
56
>TCONS_00000065
67
and so on.
 
Old 02-18-2016, 08:46 PM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,105
Blog Entries: 10

Rep: Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524
This question probably should be in the Programming Forum.

You can click the report button to request a moderator move the thread to that forum.

Yes, there are always ways to solve this. What have you tried on your own up to this point?

The members of LQ are not here to provide solutions, but instead to help you to work out things like this with your own effort.

If it's a homework assignment, that is fine, but you should be honest and state that, or state why you wish to solve this particular problem. And in addition, post what you've tried to solve this. A big issue to consider is whether you are trying to solve this in a program or a script, and then what language you plan to use. Parsing a file and making determinations like what you've presented are exactly what programs and scripts are for.
 
Old 02-18-2016, 10:02 PM   #3
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 2,141

Rep: Reputation: Disabled
one word: use patterns
 
Old 02-19-2016, 12:23 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
If not sure where to start, you could try using awk.
 
1 members found this post helpful.
Old 02-19-2016, 02:27 AM   #5
sam@
Member
 
Registered: Sep 2013
Posts: 31

Rep: Reputation: Disabled
Hi Grail,

I tried
Quote:
awk '/^>/ {print; next; } { sequencelen = length($0); print sequencelen}'

but this gave me length of each line instead of each sequence.

i essentially want to calculate length between first ">" and next "> " and the next so on.
 
Old 02-19-2016, 02:36 AM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,842

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
Two LQ userids - . That'll make the moderators cranky.

Try += on the sequencelen. Note however it will need resetting. And the last one will need an END{} clause.
grail will come up with some esoteric (better) solution, but that'll fit in with your attempts so far.

Last edited by syg00; 02-19-2016 at 02:38 AM. Reason: typo
 
Old 02-19-2016, 02:39 AM   #7
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,124

Rep: Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271Reputation: 2271
You may set line terminator to > and just print line length. In awk it is will be one simple line.
 
Old 02-19-2016, 04:13 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,254

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
I have a question before assisting further in the answer ... Your example output would appear not to be the length of any of the lines shown or their totals??

Maybe you need to also advise exactly what it is you are trying to add up?
Based on your presented example the current values are:

Lines starting with '>' have 15 characters
All other lines have 60 characters

As you can see from these values it is not possible to get values of 43, 56 or 67
 
Old 02-19-2016, 07:30 AM   #9
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 2,141

Rep: Reputation: Disabled
patterns: use patterns..

Code:
>TCONS_00000066
>TCONS_00006042
>TCONS_00000065
does not anyone see the pattern within this output that they can use to tell which ever program they are going to use to count the lenght of then to start over and count next line?
 
Old 02-19-2016, 08:13 AM   #10
rtmistler
Moderator
 
Registered: Mar 2011
Location: Sutton, MA. USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu
Posts: 4,105
Blog Entries: 10

Rep: Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524Reputation: 1524
Quote:
Originally Posted by BW-userx View Post
patterns: use patterns..

Code:
>TCONS_00000066
>TCONS_00006042
>TCONS_00000065
does not anyone see the pattern within this output that they can use to tell which ever program they are going to use to count the lenght of then to start over and count next line?
Yes we see those. Personally I approach things at the character level and therefore would look for line terminator followed by the > and have a state set indicating whether I'm counting chars or seeking the end of a recent >TCONS_ term. OP needs to reply and offer what method they are using, they haven't said script, program, or something else.
 
Old 02-19-2016, 08:24 AM   #11
BW-userx
Senior Member
 
Registered: Sep 2013
Location: MID-SOUTH USA
Distribution: Void Linux / Slackware 14.2
Posts: 2,141

Rep: Reputation: Disabled
Quote:
Originally Posted by rtmistler View Post
Yes we see those. Personally I approach things at the character level and therefore would look for line terminator followed by the > and have a state set indicating whether I'm counting chars or seeking the end of a recent >TCONS_ term.
terminator char is a good one to use too. EOL (end of line).

That is the question is it not. what is he really looking for, char count? line count. What does he really want? what method of attack does he want to use? does he have a dead line to keep in getting this done? does he know anything about programming in general even?

Quote:
OP needs to reply and offer what method they are using, they haven't said script, program, or something else.
perhaps he knows Java and is still out on a coffee break.

pizza everyone until he gets back!?!?!?

Last edited by BW-userx; 02-19-2016 at 08:26 AM.
 
Old 02-19-2016, 09:46 PM   #12
pitosh
LQ Newbie
 
Registered: Feb 2016
Posts: 2

Original Poster
Rep: Reputation: Disabled
Just want to use a awk or sed command to do this.. dont need a script or program for this.
 
Old 02-19-2016, 11:50 PM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,842

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
And did you try my suggestion ?. Should have given you food for thought.
We are here to help, not write the complete solution for you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting File layout from pipe to fixed length based on control file kumar98 Linux - Newbie 1 12-23-2015 04:16 PM
[SOLVED] zero length file entry sryzdn Linux - Newbie 5 02-22-2014 01:03 PM
[SOLVED] Convert length-indicated variable length record file to LF-terminated Z038 Linux - General 10 11-30-2012 12:59 AM
[SOLVED] how to get the mid point (middle line ) of file, provided the length of file unknown vaibhavs17 Linux - Newbie 1 07-09-2012 03:53 PM
Echo sequence of numbers with '0' padding for length of 3 chars in CLI koobi Programming 9 11-16-2009 05:40 AM


All times are GMT -5. The time now is 02:21 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration