LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-31-2013, 10:01 PM   #1
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Rep: Reputation: 0
Perl script that skips a header, sorts the rest, then operates on the data


I have these data files in this format:

Code:
##header
##to 
##be
##ignored	
chr1	numbers	more-numbers
chr10	numbers	more-numbers
chr2	numbers	more-numbers
As you can see the order of the first column (below the header) is lexical, and not alphanumeric. I want to be able to skip the header, sort the data correctly, and then perform functions on the data. I figured out how to sort the data properly (an unnecessarily complicated procedure in Perl IMO), but that was on a test file with no header.

Currently I can do exactly what I want by calling my Perl script from a shell script like this:

Code:
#!/bin/bash
sed '1,18d' "$1" |
sort -V > $X.sort 
perl-script.pl $X.sort -var=$2
The sed removes the header, sort -V sorts alphanumerical, the -var= value is passed into the "perl-script.pl".

However, I want to integrate this all into a Perl script. There is a lot more I plan to do and I need to know more about Perl structure and syntax, etc..

Here's the Perl script I've written so far.

Code:
#!/usr/bin/perl
use warnings;
no warnings 'uninitialized';
use Math::Round;
use Getopt::Long;

@xls = $ARGV[0];
open ( IN, "@xls" ) or die "Can't open file: $!";
$count = 1;
my $args;
my %args;
GetOptions(\%args,"var=f") or die "D'oh!";
die "Missing -var=[num]!\n" unless $args{var};
while (<IN>)	{
	next if /#/;		# remove the header	  
	chomp;
	my @fields = split ("\t",$_);
	if ($fields[8] <= $args{var})	{
	my @new_score = nearest(1,$fields[6]);
	my @name = $count;
	print "$fields[0]\t$fields[1]\t$fields[2]\tData_@name\t@new_score\n";
	$count++
					}
		}
close IN;
This is the other Perl script I wrote that can sort alphanumerically
Code:
#!/usr/bin/perl
use warnings;
no warnings 'uninitialized';
use Sort::Naturally;

@xls = $ARGV[0];
open (IN, "@xls") or die "Can't open this shit $!";
my @sort = map {$_->[0]}
	   sort {ncmp($a->[1], $b->[1])}
	   map  {chomp;[$_,split(/\t/)]} <IN>;
print "$_\n" for @sort;
close IN;
There seems to be something I don't get on how to perform a function on an entire array (e.g. sort) and then take that entire sorted array and perform other functions on it as if it were the $ARGV[0]. Does that make sense? This is a sticking point for me on other programs I'm writing too, so any help on this should help me on that too
 
Old 10-31-2013, 10:45 PM   #2
codeguy
Member
 
Registered: Jan 2004
Distribution: Slackware
Posts: 153

Rep: Reputation: 33
Quote:
next if /#/; # remove the header
You might consider:
PHP Code:
 next if (/^\s*#/); 
Anchor the regx at the start, its much faster, optionally includes white space, and probably more accurate.

After skipping the heading, you can add the line to an array: push(@list, $line);

Then sort it:

my @newlist = sort {ncmp($a->[1], $b->[1])} @list;

After sorting, you can iterate and split out the columns:
foreach my $x (@newlist)
{
my @columns = split("\t", $x);
etc..
}



Thats one way. You say you'll do a lot of stuff to the number, so there might be a faster way, but it depends on if the first column is unique. Will the first column (chr1, chr2, etc) ever have duplicates?


-Andy
 
Old 11-01-2013, 10:35 PM   #3
captainentropy
Member
 
Registered: Mar 2010
Location: Berkeley
Distribution: Ubuntu, Mint, CentOS
Posts: 81

Original Poster
Rep: Reputation: 0
Thanks for the reply codeguy. Using this below I can skip the header and print out the data but it's not sorted the same way sort -V does it. I'm not saying it's wrong, though. The column I'm sorting are chromosomes and as it is now the order puts chr10 ahead of chr2. My output puts the numbers in the right order (e.g. chr10 comes after chr2) but chrX comes before them all. Linux sort -V doesn't come out like that. I think I see why, sort -V is a version sort but Sort::Naturally will put non-numeric values first http://search.cpan.org/~bingos/Sort-...t/Naturally.pm

So, I need to try the Sort::Version module to get this right (chrX should come last).

I'll figure that out and come back. I have some other questions regarding the rest of the code that wasn't working for me. Also, I commented out the push part and changed @list back to <IN> in the sort routine because for some reason it simply returned the original array, sans header.

Code:
#!/usr/bin/perl
use warnings;
no warnings 'uninitialized';
use Math::Round;
use Getopt::Long;
use Sort::Naturally;

@xls = $ARGV[0];
open ( IN, "@xls" ) or die "Can't open file: $!";
#$count = 1;
my $args;
my %args;
GetOptions(\%args,"FDR=f") or die "D'oh!";
die "Missing -FDR=[num]!\n" unless $args{FDR};
while (<IN>)	{{
	next if /#/;		# remove the header	#  
	chomp;	
#	my @list;
#	push(@list, $_);
	my @newlist = map {$_->[0]}
	   sort {ncmp($a->[1], $b->[1])}
	   map  {chomp;[$_,split(/\t/)]} <IN>;
	print "$_\n" for @newlist;	
		}}
close IN;
 
Old 11-02-2013, 09:50 AM   #4
codeguy
Member
 
Registered: Jan 2004
Distribution: Slackware
Posts: 153

Rep: Reputation: 33
Um, here, how about this:

Code:
#!/usr/bin/perl

use strict;
use warnings;
use Sort::Naturally;


my @list;
my $file = shift;
print "Reading file: $file\n";
open(F, '<', $file) or die;
while (<F>)
{
	next if (/^s*#/);
	chomp;
	my @cols = split("\t");
	push(@list, \@cols);
}
close(F);
my @sorted = sort {ncmp($a->[0], $b->[0])} @list;

foreach my $row (@sorted)
{
	print "[", $row->[0], "]","[", $row->[1], "]","[", $row->[2], "]\n";
}

Last edited by codeguy; 11-02-2013 at 09:51 AM. Reason: removed "use Data::Dumper;", was for testing only
 
Old 11-11-2013, 03:32 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Debian, Mint, Puppy, Raspbian
Posts: 3,433

Rep: Reputation: 203Reputation: 203Reputation: 203
my go ;-)

Code:
#!/usr/bin/perl 


sub your_func {
    print ">@_<\n";
}

chomp (my @L = grep {!/^#/} (<>));
map {your_func $_}  sort { (split " ", $a)[1] <=> (split " ", $b)[1] }  @L;
 
  


Reply

Tags
perl


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Want to add data in the header field of tcp/ip header Maitrikkshah Linux - Networking 1 08-06-2011 07:07 AM
I get "invalid ELF header” when trying to start a perl script in irssi Techno Guy Linux - Newbie 2 01-23-2010 08:26 PM
Changing the mail Header using Perl Email Script athreyavc Programming 1 11-28-2008 08:07 AM
Shell Script that sorts data with number beginning on each line. sunksullen Programming 12 05-09-2007 04:35 PM
perl script, paired data disruptive Programming 2 02-20-2007 08:51 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration