LinuxQuestions.org
Latest LQ Deal: Linux Power User Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-06-2013, 11:48 PM   #1
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Rep: Reputation: Disabled
column and row comparision with PERL


I have data like this
1. First I want to find how many columns (from AB1 to AB5) are different for P1 and P2 ,
Eq means: Both P1 and P2 should contain same letters (alleles) or if any one of P1 or P2 contains Z/Z or -/- I should consider them as eq only. Now I am doing this using this formula =IF(D2=D3,"EQ",IF(OR(D2="Z/Z",D2="-/-"),"EQ",IF(OR(D3="Z/Z",D3="-/-"),"EQ","NE")))

2. I will compare lines column values from 1 with P2 across all the columns (from AB1 to AB5) in horizontal way and continue for remaining lines from 2 to 5. if they match I would like to give 1 else 0 and I would like to continue this till my programme encounters second set of P1 and P2. Presently I am doing this with this formula =if(D4=D$3,1,0).
3. I will make sum for lines 1 to 5 across all the columns from columns AB1 to AB5, but I will include only columns showing different for P1 and P2 in my sum count. Now I am working on this with sumif formula.
4. I will calculate percentage of matching lines 1 to 5 with P2 by dividing sum came from SUMIF with number of different markers between P1 and P2.
5. I want to repeat this for remaining set of P1 and P2.
I am expecting like this
LINES XY1 XY2 XY3 XY4 XY5
P1 EQ NE EQ EQ EQ SUM %
P2 1
1 0 0 1 0 0 0 0
2 0 1 0 0 1 1 100
3 0 0 0 0 1 0 0
4 1 0 0 0 0 0 0
5 1 1 0 0 0 1 100

Like this I have data in more than 5000 rows and at present I am doing in excel 2010 with different formulas but it is taking lot of my energy. I have perl script like this and i am getting some error like use of unintialized vlaue $_ in concatenation at line19 after using this code.
Code:
use strict;
use warnings;
sub compare_alleles {
  return 1 if grep {$_ eq '-/-' or $_ eq 'Z/Z' } @_;
  return $_[0] eq $_[1] ? 1 : 0;
}
my $format;
my (@p1, @p2);
my @unequal;
while (<>) {
  unless (/^(P?\d)/) {
    my @widths;
    push @widths, $+[1] - $-[1] while /(\S+\s*)/g;
    pop @widths;
    push @widths, $widths[-1], $widths[-1];
    my$format = join '', map{"%-${_}s", @widths; $format . = "\n";
    print;
    next;
  }
  my @fields = split;
  if ($fields[0] eq 'P1') {
    @p1 = @fields;
  }
  elsif ($fields[0] eq 'P2') {
    @p2 = @fields;
    printf $format, 'P1', map (compare_alleles($p1[$_], $p2[$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
    printf $format, 'P2', map('', 1..5), '', '1';
    @unequal = grep { not compare_alleles($p1[$_], $p2[$_]) } 1..5;
  }
  else {
    my @columns = ($fields[0], map { $fields[$_] eq $p2[$_] ? 1 : 0 } 1..5);
    my $sum = 0;
    $sum += $_ for @columns[@unequal];
    my $percent = $sum == 0 ? 0 : $sum * 100 / @unequal;
    printf $format, @columns, $sum, $percent;
  }
}
I attached my sample file for better understaning. I request PERL gurus to solve this error and any help in this regard is higly aprreciated.
Thanking you,
Genetist
Attached Files
File Type: txt Marker.txt (546 Bytes, 11 views)

Last edited by genetist; 10-07-2013 at 01:56 AM. Reason: To putting code tags
 
Old 10-07-2013, 12:41 AM   #2
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep: Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770
would be nice to use [code]here comes your script[/code] to keep formatting.
The code you posted does not work, please fix it.
also would be much better to post the real error messages.
In general you can use named variables instead of $_, that will make the code a bit better (more readable).
 
Old 10-07-2013, 01:58 AM   #3
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Dear Pan64,
Thank you very much for your reply and at the same time i am sorry i donot know how tag code here because i am newbie here. I did changed you suggested.

Thank you
 
Old 10-07-2013, 02:01 AM   #4
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep: Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770
your code still contains syntax errors, for example this line is useless:
my$format = join '', map{"%-${_}s", @widths; $format . = "\n";

I do not want to fix that script, so please post the code you tested (how did you get those error messages?).
 
Old 10-07-2013, 02:22 AM   #5
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Pan64,
This is the code i run after your suggestions and i am getting error like this. Please see the attched image for error
Code:
use strict;
use warnings;
sub compare_alleles {
  return 1 if grep {$_ eq '-/-' or $_ eq 'Z/Z' } @_;
  return $_[0] eq $_[1] ? 1 : 0;
}
my $format;
my (@p1, @p2);
my @unequal;
while (<>) {
  unless (/^(P?\d)/) {
    my @widths;
    push @widths, $+[1] - $-[1] while /(\S+\s*)/g;
    pop @widths;
    push @widths, $widths[-1], $widths[-1];
       print;
    next;
  }
  my @fields = split;
  if ($fields[0] eq 'P1') {
    @p1 = @fields;
  }
  elsif ($fields[0] eq 'P2') {
    @p2 = @fields;
    printf $format, 'P1', map (compare_alleles($p1[$_], $p2[$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
    printf $format, 'P2', map('', 1..5), '', '1';
    @unequal = grep { not compare_alleles($p1[$_], $p2[$_]) } 1..5;
  }
  else {
    my @columns = ($fields[0], map { $fields[$_] eq $p2[$_] ? 1 : 0 } 1..5);
    my $sum = 0;
    $sum += $_ for @columns[@unequal];
    my $percent = $sum == 0 ? 0 : $sum * 100 / @unequal;
    printf $format, @columns, $sum, $percent;
  }
}
Regards,
Thank you
Attached Thumbnails
Click image for larger version

Name:	Error.JPG
Views:	13
Size:	22.8 KB
ID:	13656  
 
Old 10-07-2013, 03:20 AM   #6
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep: Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770
whould be nice to post the real error message:
Quote:
Use of uninitialized value $format in printf at t.pl line 25, <> line 3.
Use of uninitialized value $format in printf at t.pl line 26, <> line 3.
Use of uninitialized value $format in printf at t.pl line 34, <> line 4.
Use of uninitialized value $format in printf at t.pl line 34, <> line 5.
Use of uninitialized value $format in printf at t.pl line 34, <> line 6.
Use of uninitialized value $format in printf at t.pl line 34, <> line 7.
Use of uninitialized value $format in printf at t.pl line 34, <> line 8.
Use of uninitialized value $format in printf at t.pl line 25, <> line 10.
Use of uninitialized value $format in printf at t.pl line 26, <> line 10.
Use of uninitialized value $format in printf at t.pl line 34, <> line 11.
Use of uninitialized value $format in printf at t.pl line 34, <> line 12.
Use of uninitialized value $format in printf at t.pl line 34, <> line 13.
Use of uninitialized value $format in printf at t.pl line 34, <> line 14.
Use of uninitialized value $format in printf at t.pl line 34, <> line 15.
It is quite simple, the variable format is not initialized. You need to specify format for printf.
I think you need to specify different format strings for those statements (or simply use print, without formatting).
 
Old 10-09-2013, 12:05 AM   #7
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Dear Pan64,
Thank you very much for your reply. I tried your suggestions to fix this script but unsuccessfull. can you please fix it for me?
Thanks
 
Old 10-09-2013, 12:19 AM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep: Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770
No, I cannot, I have no idea what format do you really need. Also please show us what have you tried, probably we can go further together ...
 
Old 10-09-2013, 01:30 AM   #9
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Dear Pan64,
Thank you very much for your reply and for acceptance to ago ahead on this problem. I will post what i did from starting and what i want.
Thanks
 
Old 10-09-2013, 01:43 AM   #10
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Pan64,
There are the steps i need to worked out on my sample data(please the attached file for better understand).
1. I want to find which columns (from XY1 to XY5) are different for rows P1 and P2. Equal means that P1 and P2 contain same same letters (alleles) or either of them contains Z/Z or -/-.

2. I will compare columns from lines 1 through 5 with P2 across columns XY1 to XY5. If they match the output shoulod contain 1 otherwise 0.

3. I will calculate a line total for lines 1 to 5 only for the columns that differed between P1 and P2.

4. I will calculate a line percentage for lines 1 to 5 by dividing the sum by the number of columns that differed between P1 and P2.
5. I would like to continue this program for second set of P1 and P2 lines.

this is my original code
Code:
use strict;
use warnings;
sub compare_alleles {
  return 1 if grep {$_ eq '-/-' or $_ eq 'Z/Z' } @_;
  return $_[0] eq $_[1] ? 1 : 0;
}
my $format;
my (@p1, @p2);
my @unequal;
while (<>) {
  unless (/^(P?\d)/) {
    my @widths;
    push @widths, $+[1] - $-[1] while /(\S+\s*)/g;
    pop @widths;
    push @widths, $widths[-1], $widths[-1];
    $format = join '', map("%-${_}s", @widths, ''), "\n";
    print;
    next;
  }
  my @fields = split;
  if ($fields[0] eq 'P1') {
    @p1 = @fields;
  }
  elsif ($fields[0] eq 'P2') {
    @p2 = @fields;
    printf $format, 'P1', map (compare_alleles($p1[$_], $p2[$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
    printf $format, 'P2', map('', 1..5), '', '1';
    @unequal = grep { not compare_alleles($p1[$_], $p2[$_]) } 1..5;
  }
  else {
    my @columns = ($fields[0], map { $fields[$_] eq $p2[$_] ? 1 : 0 } 1..5);
    my $sum = 0;
    $sum += $_ for @columns[@unequal];
    my $percent = $sum == 0 ? 0 : $sum * 100 / @unequal;
    printf $format, @columns, $sum, $percent;
  }
}
when i run this code i got error like use of unintialized vlaue $_ in concatenation at line19 this is same as you posted and i also removed $format and used only printf function as per your suggestion and then i run my code that is giving only file name nothing more than that.
this is my changed code
Code:
use strict;
use warnings;
sub compare_alleles {
  return 1 if grep {$_ eq '-/-' or $_ eq 'Z/Z' } @_;
  return $_[0] eq $_[1] ? 1 : 0;
}
my (@p1, @p2);
my @unequal;
while (<>) {
  unless (/^(P?\d)/) {
    my @widths;
    push @widths, $+[1] - $-[1] while /(\S+\s*)/g;
    pop @widths;
    push @widths, $widths[-1], $widths[-1];
       print;
    next;
  }
  my @fields = split;
  if ($fields[0] eq 'P1') {
    @p1 = @fields;
  }
  elsif ($fields[0] eq 'P2') {
    @p2 = @fields;
    printf 'P1', map (compare_alleles($p1[$_], $p2[$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
    printf 'P2', map('', 1..5), '', '1';
    @unequal = grep { not compare_alleles($p1[$_], $p2[$_]) } 1..5;
  }
  else {
    my @columns = ($fields[0], map { $fields[$_] eq $p2[$_] ? 1 : 0 } 1..5);
    my $sum = 0;
    $sum += $_ for @columns[@unequal];
    my $percent = $sum == 0 ? 0 : $sum * 100 / @unequal;
    printf @columns, $sum, $percent;
  }
}
this in total what did and what i want for my problem.
Thanking you,
Regards,
Genetist
Attached Files
File Type: txt sampledata.txt (1.5 KB, 8 views)
 
Old 10-09-2013, 02:20 AM   #11
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep: Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770Reputation: 2770
no, I suggested to use print ..... (instead of printf $format ....), so drop that f that is only print.
 
Old 10-09-2013, 02:59 AM   #12
genetist
LQ Newbie
 
Registered: Jul 2013
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Pan64,

Thanks for your reply and i included your suggestions in my script (i kept only print). now my programme printing file path that i given (Actually my programmes asks me to provide filename with path) and it is printing the same thing what i given nothing more than that. anyway thanks for your reply.

Regards,
Genetist
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Help me in transpose the file column to row freshlinux Linux - Newbie 2 09-06-2013 03:40 AM
Convert row to column micyew Programming 15 06-29-2012 01:28 PM
[SOLVED] find a null value in a row/column and delete entire row umix Linux - Newbie 10 10-13-2011 01:26 AM
transpose row to column? resolute155 Programming 3 09-07-2009 02:29 PM
Transposing a column into a row mayyash Linux - General 1 09-30-2005 02:23 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 03:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration