LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 04-25-2012, 11:34 AM   #1
Trotel
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Rep: Reputation: Disabled
Perl for columns


Hi,

I beginner in perl, I wanto to work with columns.
I have a file like this:

Code:
>gi|261496997|ref|NZ_ACZX01000137.1| Mannheimia haemolytica serotype A2 str. OVINE contig00171, 
orf00001      104     1081  +2     2.95
orf00003     1103     1297  +2     1.60
>gi|261496959|ref|NZ_ACZX01000135.1| Mannheimia haemolytica serotype A2 str. OVINE contig00169,
orf00001      481       47  -2     7.06
orf00002     1953      625  -1     5.61
orf00003     3505     1940  -2     7.72
>gi|261497008|ref|NZ_ACZX01000139.1| Mannheimia haemolytica serotype A2 str. OVINE contig00173,
orf00001      295       35  -2     5.99
orf00002      522      316  -1     2.34
...
I want to have a file with continuous "orf" numbers like this:

Code:
orf00001 gi|261496997|ref|NZ_ACZX01000137.1|      104     1081  +2     2.95
orf00002 gi|261496997|ref|NZ_ACZX01000137.1|     1103     1297  +2     1.60
orf00003 gi|261496959|ref|NZ_ACZX01000135.1|      481       47  -2     7.06
orf00004 gi|261496959|ref|NZ_ACZX01000135.1|     1953      625  -1     5.61
orf00005 gi|261496959|ref|NZ_ACZX01000135.1|     3505     1940  -2     7.72
orf00006 gi|261497008|ref|NZ_ACZX01000139.1|      295       35  -2     5.99
orf00007 gi|261497008|ref|NZ_ACZX01000139.1|      522      316  -1     2.34
...
I tried with this:
Quote:
perl -ne '$p=$1 if /^>gi\|(.*)\|/; print $p." ".$_ if /^orf/' MY_FILE.txt
But it show me in other order and no continuous "orf" numbers
Code:
gi|261496997|ref|NZ_ACZX01000137.1| orf00001      104     1081  +2     2.95
gi|261496997|ref|NZ_ACZX01000137.1| orf00003     1103     1297  +2     1.60
gi|261496959|ref|NZ_ACZX01000135.1| orf00001      481       47  -2     7.06
gi|261496959|ref|NZ_ACZX01000135.1| orf00002     1953      625  -1     5.61
gi|261496959|ref|NZ_ACZX01000135.1| orf00003     3505     1940  -2     7.72
gi|261497008|ref|NZ_ACZX01000139.1| orf00001      295       35  -2     5.99
gi|261497008|ref|NZ_ACZX01000139.1| orf00002      522      316  -1     2.34
...
Can anybody help me?

Last edited by Trotel; 04-28-2012 at 12:30 PM.
 
Old 04-25-2012, 01:27 PM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:
orf00002 gi|261496997|ref|NZ_ACZX01000137.1|     1103     1297  +2     1.60
This line is a conglomeration of:
Code:
orf00003     1103     1297  +2     1.60
orf00002 <from any line or none??>
 
Old 04-25-2012, 06:18 PM   #3
Trotel
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
The initial information in the file is in order, is correct, I only add the name ">gi|...|" and change the "orf###" order, not for any line, each line ">gi|...|" have a lot of "orf###", first are the ">gi|...|" and then its "orf###".

Quote:
Originally Posted by grail View Post
Your information seems a little bit unusual? Are you saying that the original data is in the incorrect order? As you have changed the orf* values but left the rest of the data from
lines, eg.
Code:
orf00002 gi|261496997|ref|NZ_ACZX01000137.1|     1103     1297  +2     1.60
This line is a conglomeration of:
Code:
orf00003     1103     1297  +2     1.60
orf00002 <from any line or none??>
 
Old 04-26-2012, 04:44 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
So the pseudo code would be:

Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter
 
Old 04-26-2012, 10:31 AM   #5
Trotel
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Could yoou write the command, because I am beginner, and the perl command that I used, someone send it to me.

Quote:
Originally Posted by grail View Post
So the pseudo code would be:

Start own orf counter
Save gi.*|
On finding line starting with orf use counter, saved data and rest of line
increment counter
 
Old 04-26-2012, 01:00 PM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
Well if perl is not your strength, what else can you use? I am more familiar with ruby or awk, but the idea is for you to make some attempt as so far you have only copied someone else.

Do you understand the snippet you copied?

This is the break down, might help you out:

$p=$1 if /^>gi\|(.*)\|/ - So to a beginner this may seem a little backward, but the if is evaluated first and when true $p (scalar variable) gets assigned what is in $1 (hmmm back reference I think).

So the "if" says, when the start of the line (^) is followed by the string ">gi\|" (the slosh (\) before the pipe (|) is so it is a literal pipe and not misunderstood), then save everything ((.*))
up until the last pipe (\|) [the save is put into $1]. When all that is true do the assignment.

print $p." ".$_ if /^orf/ - Again if is first. "if" the line starts (^) with the string "orf", then print the previously stored variable $p concatenated (.) with a space and concatenated with the current
line ($_)

Hope that helps explain your current output.
 
Old 04-28-2012, 12:28 PM   #7
Trotel
LQ Newbie
 
Registered: Mar 2012
Posts: 16

Original Poster
Rep: Reputation: Disabled
Solved

Ok thanks, now I understand the command perl, I add to this result the command "cut", "pr", and "sed", because awk don't align to the right, thanks for your time.

Code:
perl -ne '$p=$1 if /^>(.*)\| /; print $p." ".$_ if /^orf/' orf.predict > orf1
cut -c 1-35 orf1 | sed 's/.1 /.1\|/g' > name
cut -c 36-46 1 > orfcode
cut -c 47-55 1 > stat1
cut -c 56-67 1 > stat2-3
cut -c 67-74 1 > stat4
pr -tms orfcode name stat1 stat2-3 stat4 | sed 's/\t//g' > orf.final
 
Old 04-28-2012, 01:39 PM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,178

Rep: Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779Reputation: 1779
Right ... well good to see you have a solution but most of the Perl gurus will be rolling over in their graves I know you are new to Perl, but it can in fact do all the tasks that cut
and sed have done for you and also have a plethora of formatting abilities.

Oh, and the comment about awk is also incorrect, again you just need to learn a bit more about it

In case you are interested, here are some helpful links:

http://www.cpan.org/
http://www.gnu.org/software/gawk/man...ode/index.html
 
  


Reply

Tags
columns, perl, script


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] [Perl] fail to sort a file with 300,000 lines by multiple columns Kunsheng Programming 10 11-13-2009 06:41 PM
[SOLVED] manipulating variable columns in shell or in perl baidym Programming 5 09-08-2009 06:02 PM
Batch manipulating CSV columns and files in Perl script briana.paige Linux - Newbie 1 07-14-2009 11:02 AM
addings columns in perl script activeq Programming 5 09-03-2008 01:17 AM
[Perl] append columns to file noir911 Programming 3 02-08-2007 05:29 AM


All times are GMT -5. The time now is 02:32 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration