LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Perl for columns (http://www.linuxquestions.org/questions/programming-9/perl-for-columns-941692/)

Trotel 04-25-2012 11:34 AM

Perl for columns
 
Hi,

I beginner in perl, I wanto to work with columns.
I have a file like this:

Code:

>gi|261496997|ref|NZ_ACZX01000137.1| Mannheimia haemolytica serotype A2 str. OVINE contig00171,
orf00001      104    1081  +2    2.95
orf00003    1103    1297  +2    1.60
>gi|261496959|ref|NZ_ACZX01000135.1| Mannheimia haemolytica serotype A2 str. OVINE contig00169,
orf00001      481      47  -2    7.06
orf00002    1953      625  -1    5.61
orf00003    3505    1940  -2    7.72
>gi|261497008|ref|NZ_ACZX01000139.1| Mannheimia haemolytica serotype A2 str. OVINE contig00173,
orf00001      295      35  -2    5.99
orf00002      522      316  -1    2.34
...

I want to have a file with continuous "orf" numbers like this:

Code:

orf00001 gi|261496997|ref|NZ_ACZX01000137.1|      104    1081  +2    2.95
orf00002 gi|261496997|ref|NZ_ACZX01000137.1|    1103    1297  +2    1.60
orf00003 gi|261496959|ref|NZ_ACZX01000135.1|      481      47  -2    7.06
orf00004 gi|261496959|ref|NZ_ACZX01000135.1|    1953      625  -1    5.61
orf00005 gi|261496959|ref|NZ_ACZX01000135.1|    3505    1940  -2    7.72
orf00006 gi|261497008|ref|NZ_ACZX01000139.1|      295      35  -2    5.99
orf00007 gi|261497008|ref|NZ_ACZX01000139.1|      522      316  -1    2.34
...

I tried with this:
Quote:

perl -ne '$p=$1 if /^>gi\|(.*)\|/; print $p." ".$_ if /^orf/' MY_FILE.txt
But it show me in other order and no continuous "orf" numbers
Code:

gi|261496997|ref|NZ_ACZX01000137.1| orf00001      104    1081  +2    2.95
gi|261496997|ref|NZ_ACZX01000137.1| orf00003    1103    1297  +2    1.60
gi|261496959|ref|NZ_ACZX01000135.1| orf00001      481      47  -2    7.06
gi|261496959|ref|NZ_ACZX01000135.1| orf00002    1953      625  -1    5.61
gi|261496959|ref|NZ_ACZX01000135.1| orf00003    3505    1940  -2    7.72
gi|261497008|ref|NZ_ACZX01000139.1| orf00001      295      35  -2    5.99
gi|261497008|ref|NZ_ACZX01000139.1| orf00002      522      316  -1    2.34
...

Can anybody help me?

grail 04-25-2012 01:27 PM

Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:

orf00002 gi|261496997|ref|NZ_ACZX01000137.1|    1103    1297  +2    1.60
This line is a conglomeration of:
Code:

orf00003    1103    1297  +2    1.60
orf00002 <from any line or none??>


Trotel 04-25-2012 06:18 PM

The initial information in the file is in order, is correct, I only add the name ">gi|...|" and change the "orf###" order, not for any line, each line ">gi|...|" have a lot of "orf###", first are the ">gi|...|" and then its "orf###".

Quote:

Originally Posted by grail (Post 4662914)
Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:

orf00002 gi|261496997|ref|NZ_ACZX01000137.1|    1103    1297  +2    1.60
This line is a conglomeration of:
Code:

orf00003    1103    1297  +2    1.60
orf00002 <from any line or none??>



grail 04-26-2012 04:44 AM

So the pseudo code would be:

Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter

Trotel 04-26-2012 10:31 AM

Could yoou write the command, because I am beginner, and the perl command that I used, someone send it to me.

Quote:

Originally Posted by grail (Post 4663401)
So the pseudo code would be:

Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter


grail 04-26-2012 01:00 PM

Well if perl is not your strength, what else can you use? I am more familiar with ruby or awk, but the idea is for you to make some attempt as so far you have only copied someone else.

Do you understand the snippet you copied?

This is the break down, might help you out:

$p=$1 if /^>gi\|(.*)\|/ - So to a beginner this may seem a little backward, but the if is evaluated first and when true $p (scalar variable) gets assigned what is in $1 (hmmm back reference I think).

So the "if" says, when the start of the line (^) is followed by the string ">gi\|" (the slosh (\) before the pipe (|) is so it is a literal pipe and not misunderstood), then save everything ((.*))
up until the last pipe (\|) [the save is put into $1]. When all that is true do the assignment.

print $p." ".$_ if /^orf/ - Again if is first. "if" the line starts (^) with the string "orf", then print the previously stored variable $p concatenated (.) with a space and concatenated with the current
line ($_)

Hope that helps explain your current output.

Trotel 04-28-2012 12:28 PM

Solved
 
Ok thanks, now I understand the command perl, I add to this result the command "cut", "pr", and "sed", because awk don't align to the right, thanks for your time.

Code:

perl -ne '$p=$1 if /^>(.*)\| /; print $p." ".$_ if /^orf/' orf.predict > orf1
cut -c 1-35 orf1 | sed 's/.1 /.1\|/g' > name
cut -c 36-46 1 > orfcode
cut -c 47-55 1 > stat1
cut -c 56-67 1 > stat2-3
cut -c 67-74 1 > stat4
pr -tms orfcode name stat1 stat2-3 stat4 | sed 's/\t//g' > orf.final


grail 04-28-2012 01:39 PM

Right ... well good to see you have a solution but most of the Perl gurus will be rolling over in their graves :( I know you are new to Perl, but it can in fact do all the tasks that cut
and sed have done for you and also have a plethora of formatting abilities.

Oh, and the comment about awk is also incorrect, again you just need to learn a bit more about it :)

In case you are interested, here are some helpful links:

http://www.cpan.org/
http://www.gnu.org/software/gawk/man...ode/index.html


All times are GMT -5. The time now is 07:18 AM.