Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
04-25-2012, 11:34 AM
|
#1
|
|
LQ Newbie
Registered: Mar 2012
Posts: 15
Rep: 
|
Perl for columns
Hi,
I beginner in perl, I wanto to work with columns.
I have a file like this:
Code:
>gi|261496997|ref|NZ_ACZX01000137.1| Mannheimia haemolytica serotype A2 str. OVINE contig00171,
orf00001 104 1081 +2 2.95
orf00003 1103 1297 +2 1.60
>gi|261496959|ref|NZ_ACZX01000135.1| Mannheimia haemolytica serotype A2 str. OVINE contig00169,
orf00001 481 47 -2 7.06
orf00002 1953 625 -1 5.61
orf00003 3505 1940 -2 7.72
>gi|261497008|ref|NZ_ACZX01000139.1| Mannheimia haemolytica serotype A2 str. OVINE contig00173,
orf00001 295 35 -2 5.99
orf00002 522 316 -1 2.34
...
I want to have a file with continuous "orf" numbers like this:
Code:
orf00001 gi|261496997|ref|NZ_ACZX01000137.1| 104 1081 +2 2.95
orf00002 gi|261496997|ref|NZ_ACZX01000137.1| 1103 1297 +2 1.60
orf00003 gi|261496959|ref|NZ_ACZX01000135.1| 481 47 -2 7.06
orf00004 gi|261496959|ref|NZ_ACZX01000135.1| 1953 625 -1 5.61
orf00005 gi|261496959|ref|NZ_ACZX01000135.1| 3505 1940 -2 7.72
orf00006 gi|261497008|ref|NZ_ACZX01000139.1| 295 35 -2 5.99
orf00007 gi|261497008|ref|NZ_ACZX01000139.1| 522 316 -1 2.34
...
I tried with this:
Quote:
|
perl -ne '$p=$1 if /^>gi\|(.*)\|/; print $p." ".$_ if /^orf/' MY_FILE.txt
|
But it show me in other order and no continuous "orf" numbers
Code:
gi|261496997|ref|NZ_ACZX01000137.1| orf00001 104 1081 +2 2.95
gi|261496997|ref|NZ_ACZX01000137.1| orf00003 1103 1297 +2 1.60
gi|261496959|ref|NZ_ACZX01000135.1| orf00001 481 47 -2 7.06
gi|261496959|ref|NZ_ACZX01000135.1| orf00002 1953 625 -1 5.61
gi|261496959|ref|NZ_ACZX01000135.1| orf00003 3505 1940 -2 7.72
gi|261497008|ref|NZ_ACZX01000139.1| orf00001 295 35 -2 5.99
gi|261497008|ref|NZ_ACZX01000139.1| orf00002 522 316 -1 2.34
...
Can anybody help me?
Last edited by Trotel; 04-28-2012 at 12:30 PM.
|
|
|
|
04-25-2012, 01:27 PM
|
#2
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:
orf00002 gi|261496997|ref|NZ_ACZX01000137.1| 1103 1297 +2 1.60
This line is a conglomeration of:
Code:
orf00003 1103 1297 +2 1.60
orf00002 <from any line or none??>
|
|
|
|
04-25-2012, 06:18 PM
|
#3
|
|
LQ Newbie
Registered: Mar 2012
Posts: 15
Original Poster
Rep: 
|
The initial information in the file is in order, is correct, I only add the name ">gi|...|" and change the "orf###" order, not for any line, each line ">gi|...|" have a lot of "orf###", first are the ">gi|...|" and then its "orf###".
Quote:
Originally Posted by grail
Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:
orf00002 gi|261496997|ref|NZ_ACZX01000137.1| 1103 1297 +2 1.60
This line is a conglomeration of:
Code:
orf00003 1103 1297 +2 1.60
orf00002 <from any line or none??>
|
|
|
|
|
04-26-2012, 04:44 AM
|
#4
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
So the pseudo code would be:
Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter
|
|
|
|
04-26-2012, 10:31 AM
|
#5
|
|
LQ Newbie
Registered: Mar 2012
Posts: 15
Original Poster
Rep: 
|
Could yoou write the command, because I am beginner, and the perl command that I used, someone send it to me.
Quote:
Originally Posted by grail
So the pseudo code would be:
Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter
|
|
|
|
|
04-26-2012, 01:00 PM
|
#6
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
Well if perl is not your strength, what else can you use? I am more familiar with ruby or awk, but the idea is for you to make some attempt as so far you have only copied someone else.
Do you understand the snippet you copied?
This is the break down, might help you out:
$p=$1 if /^>gi\|(.*)\|/ - So to a beginner this may seem a little backward, but the if is evaluated first and when true $p (scalar variable) gets assigned what is in $1 (hmmm back reference I think).
So the "if" says, when the start of the line (^) is followed by the string ">gi\|" (the slosh (\) before the pipe (|) is so it is a literal pipe and not misunderstood), then save everything ((.*))
up until the last pipe (\|) [the save is put into $1]. When all that is true do the assignment.
print $p." ".$_ if /^orf/ - Again if is first. "if" the line starts (^) with the string "orf", then print the previously stored variable $p concatenated (.) with a space and concatenated with the current
line ($_)
Hope that helps explain your current output.
|
|
|
|
04-28-2012, 12:28 PM
|
#7
|
|
LQ Newbie
Registered: Mar 2012
Posts: 15
Original Poster
Rep: 
|
Solved
Ok thanks, now I understand the command perl, I add to this result the command "cut", "pr", and "sed", because awk don't align to the right, thanks for your time.
Code:
perl -ne '$p=$1 if /^>(.*)\| /; print $p." ".$_ if /^orf/' orf.predict > orf1
cut -c 1-35 orf1 | sed 's/.1 /.1\|/g' > name
cut -c 36-46 1 > orfcode
cut -c 47-55 1 > stat1
cut -c 56-67 1 > stat2-3
cut -c 67-74 1 > stat4
pr -tms orfcode name stat1 stat2-3 stat4 | sed 's/\t//g' > orf.final
|
|
|
|
04-28-2012, 01:39 PM
|
#8
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,305
|
Right ... well good to see you have a solution but most of the Perl gurus will be rolling over in their graves  I know you are new to Perl, but it can in fact do all the tasks that cut
and sed have done for you and also have a plethora of formatting abilities.
Oh, and the comment about awk is also incorrect, again you just need to learn a bit more about it
In case you are interested, here are some helpful links:
http://www.cpan.org/
http://www.gnu.org/software/gawk/man...ode/index.html
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:46 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|