Latest LQ Deal: Linux Power User Bundle
 Home Forums HCL Reviews Tutorials Articles Register Search Today's Posts Mark Forums Read
 LinuxQuestions.org column and row comparision with PERL
 Linux - Newbie This Linux forum is for members that are new to Linux. Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

10-06-2013, 11:48 PM   #1
genetist
LQ Newbie

Registered: Jul 2013
Posts: 8

Rep:
column and row comparision with PERL

I have data like this
1. First I want to find how many columns (from AB1 to AB5) are different for P1 and P2 ,
Eq means: Both P1 and P2 should contain same letters (alleles) or if any one of P1 or P2 contains Z/Z or -/- I should consider them as eq only. Now I am doing this using this formula =IF(D2=D3,"EQ",IF(OR(D2="Z/Z",D2="-/-"),"EQ",IF(OR(D3="Z/Z",D3="-/-"),"EQ","NE")))

2. I will compare lines column values from 1 with P2 across all the columns (from AB1 to AB5) in horizontal way and continue for remaining lines from 2 to 5. if they match I would like to give 1 else 0 and I would like to continue this till my programme encounters second set of P1 and P2. Presently I am doing this with this formula =if(D4=D\$3,1,0).
3. I will make sum for lines 1 to 5 across all the columns from columns AB1 to AB5, but I will include only columns showing different for P1 and P2 in my sum count. Now I am working on this with sumif formula.
4. I will calculate percentage of matching lines 1 to 5 with P2 by dividing sum came from SUMIF with number of different markers between P1 and P2.
5. I want to repeat this for remaining set of P1 and P2.
I am expecting like this
LINES XY1 XY2 XY3 XY4 XY5
P1 EQ NE EQ EQ EQ SUM %
P2 1
1 0 0 1 0 0 0 0
2 0 1 0 0 1 1 100
3 0 0 0 0 1 0 0
4 1 0 0 0 0 0 0
5 1 1 0 0 0 1 100

Like this I have data in more than 5000 rows and at present I am doing in excel 2010 with different formulas but it is taking lot of my energy. I have perl script like this and i am getting some error like use of unintialized vlaue \$_ in concatenation at line19 after using this code.
Code:
```use strict;
use warnings;
sub compare_alleles {
return 1 if grep {\$_ eq '-/-' or \$_ eq 'Z/Z' } @_;
return \$_[0] eq \$_[1] ? 1 : 0;
}
my \$format;
my (@p1, @p2);
my @unequal;
while (<>) {
unless (/^(P?\d)/) {
my @widths;
push @widths, \$+[1] - \$-[1] while /(\S+\s*)/g;
pop @widths;
push @widths, \$widths[-1], \$widths[-1];
my\$format = join '', map{"%-\${_}s", @widths; \$format . = "\n";
print;
next;
}
my @fields = split;
if (\$fields[0] eq 'P1') {
@p1 = @fields;
}
elsif (\$fields[0] eq 'P2') {
@p2 = @fields;
printf \$format, 'P1', map (compare_alleles(\$p1[\$_], \$p2[\$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
printf \$format, 'P2', map('', 1..5), '', '1';
@unequal = grep { not compare_alleles(\$p1[\$_], \$p2[\$_]) } 1..5;
}
else {
my @columns = (\$fields[0], map { \$fields[\$_] eq \$p2[\$_] ? 1 : 0 } 1..5);
my \$sum = 0;
\$sum += \$_ for @columns[@unequal];
my \$percent = \$sum == 0 ? 0 : \$sum * 100 / @unequal;
printf \$format, @columns, \$sum, \$percent;
}
}```
I attached my sample file for better understaning. I request PERL gurus to solve this error and any help in this regard is higly aprreciated.
Thanking you,
Genetist
Attached Files
 Marker.txt (546 Bytes, 11 views)

Last edited by genetist; 10-07-2013 at 01:56 AM. Reason: To putting code tags

 10-07-2013, 12:41 AM #2 pan64 LQ Guru   Registered: Mar 2012 Location: Hungary Distribution: debian/ubuntu/suse ... Posts: 9,438 Rep: would be nice to use [code]here comes your script[/code] to keep formatting. The code you posted does not work, please fix it. also would be much better to post the real error messages. In general you can use named variables instead of \$_, that will make the code a bit better (more readable).
 10-07-2013, 01:58 AM #3 genetist LQ Newbie   Registered: Jul 2013 Posts: 8 Original Poster Rep: Dear Pan64, Thank you very much for your reply and at the same time i am sorry i donot know how tag code here because i am newbie here. I did changed you suggested. Thank you
 10-07-2013, 02:01 AM #4 pan64 LQ Guru   Registered: Mar 2012 Location: Hungary Distribution: debian/ubuntu/suse ... Posts: 9,438 Rep: your code still contains syntax errors, for example this line is useless: my\$format = join '', map{"%-\${_}s", @widths; \$format . = "\n"; I do not want to fix that script, so please post the code you tested (how did you get those error messages?).
 10-07-2013, 02:22 AM #5 genetist LQ Newbie   Registered: Jul 2013 Posts: 8 Original Poster Rep: Hi Pan64, This is the code i run after your suggestions and i am getting error like this. Please see the attched image for error Code: ```use strict; use warnings; sub compare_alleles { return 1 if grep {\$_ eq '-/-' or \$_ eq 'Z/Z' } @_; return \$_[0] eq \$_[1] ? 1 : 0; } my \$format; my (@p1, @p2); my @unequal; while (<>) { unless (/^(P?\d)/) { my @widths; push @widths, \$+[1] - \$-[1] while /(\S+\s*)/g; pop @widths; push @widths, \$widths[-1], \$widths[-1]; print; next; } my @fields = split; if (\$fields[0] eq 'P1') { @p1 = @fields; } elsif (\$fields[0] eq 'P2') { @p2 = @fields; printf \$format, 'P1', map (compare_alleles(\$p1[\$_], \$p2[\$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%'; printf \$format, 'P2', map('', 1..5), '', '1'; @unequal = grep { not compare_alleles(\$p1[\$_], \$p2[\$_]) } 1..5; } else { my @columns = (\$fields[0], map { \$fields[\$_] eq \$p2[\$_] ? 1 : 0 } 1..5); my \$sum = 0; \$sum += \$_ for @columns[@unequal]; my \$percent = \$sum == 0 ? 0 : \$sum * 100 / @unequal; printf \$format, @columns, \$sum, \$percent; } }``` Regards, Thank you Attached Thumbnails
10-07-2013, 03:20 AM   #6
pan64
LQ Guru

Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 9,438

Rep:
whould be nice to post the real error message:
Quote:
 Use of uninitialized value \$format in printf at t.pl line 25, <> line 3. Use of uninitialized value \$format in printf at t.pl line 26, <> line 3. Use of uninitialized value \$format in printf at t.pl line 34, <> line 4. Use of uninitialized value \$format in printf at t.pl line 34, <> line 5. Use of uninitialized value \$format in printf at t.pl line 34, <> line 6. Use of uninitialized value \$format in printf at t.pl line 34, <> line 7. Use of uninitialized value \$format in printf at t.pl line 34, <> line 8. Use of uninitialized value \$format in printf at t.pl line 25, <> line 10. Use of uninitialized value \$format in printf at t.pl line 26, <> line 10. Use of uninitialized value \$format in printf at t.pl line 34, <> line 11. Use of uninitialized value \$format in printf at t.pl line 34, <> line 12. Use of uninitialized value \$format in printf at t.pl line 34, <> line 13. Use of uninitialized value \$format in printf at t.pl line 34, <> line 14. Use of uninitialized value \$format in printf at t.pl line 34, <> line 15.
It is quite simple, the variable format is not initialized. You need to specify format for printf.
I think you need to specify different format strings for those statements (or simply use print, without formatting).

 10-09-2013, 12:05 AM #7 genetist LQ Newbie   Registered: Jul 2013 Posts: 8 Original Poster Rep: Dear Pan64, Thank you very much for your reply. I tried your suggestions to fix this script but unsuccessfull. can you please fix it for me? Thanks
 10-09-2013, 12:19 AM #8 pan64 LQ Guru   Registered: Mar 2012 Location: Hungary Distribution: debian/ubuntu/suse ... Posts: 9,438 Rep: No, I cannot, I have no idea what format do you really need. Also please show us what have you tried, probably we can go further together ...
 10-09-2013, 01:30 AM #9 genetist LQ Newbie   Registered: Jul 2013 Posts: 8 Original Poster Rep: Dear Pan64, Thank you very much for your reply and for acceptance to ago ahead on this problem. I will post what i did from starting and what i want. Thanks
10-09-2013, 01:43 AM   #10
genetist
LQ Newbie

Registered: Jul 2013
Posts: 8

Original Poster
Rep:
Hi Pan64,
There are the steps i need to worked out on my sample data(please the attached file for better understand).
1. I want to find which columns (from XY1 to XY5) are different for rows P1 and P2. Equal means that P1 and P2 contain same same letters (alleles) or either of them contains Z/Z or -/-.

2. I will compare columns from lines 1 through 5 with P2 across columns XY1 to XY5. If they match the output shoulod contain 1 otherwise 0.

3. I will calculate a line total for lines 1 to 5 only for the columns that differed between P1 and P2.

4. I will calculate a line percentage for lines 1 to 5 by dividing the sum by the number of columns that differed between P1 and P2.
5. I would like to continue this program for second set of P1 and P2 lines.

this is my original code
Code:
```use strict;
use warnings;
sub compare_alleles {
return 1 if grep {\$_ eq '-/-' or \$_ eq 'Z/Z' } @_;
return \$_[0] eq \$_[1] ? 1 : 0;
}
my \$format;
my (@p1, @p2);
my @unequal;
while (<>) {
unless (/^(P?\d)/) {
my @widths;
push @widths, \$+[1] - \$-[1] while /(\S+\s*)/g;
pop @widths;
push @widths, \$widths[-1], \$widths[-1];
\$format = join '', map("%-\${_}s", @widths, ''), "\n";
print;
next;
}
my @fields = split;
if (\$fields[0] eq 'P1') {
@p1 = @fields;
}
elsif (\$fields[0] eq 'P2') {
@p2 = @fields;
printf \$format, 'P1', map (compare_alleles(\$p1[\$_], \$p2[\$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
printf \$format, 'P2', map('', 1..5), '', '1';
@unequal = grep { not compare_alleles(\$p1[\$_], \$p2[\$_]) } 1..5;
}
else {
my @columns = (\$fields[0], map { \$fields[\$_] eq \$p2[\$_] ? 1 : 0 } 1..5);
my \$sum = 0;
\$sum += \$_ for @columns[@unequal];
my \$percent = \$sum == 0 ? 0 : \$sum * 100 / @unequal;
printf \$format, @columns, \$sum, \$percent;
}
}```
when i run this code i got error like use of unintialized vlaue \$_ in concatenation at line19 this is same as you posted and i also removed \$format and used only printf function as per your suggestion and then i run my code that is giving only file name nothing more than that.
this is my changed code
Code:
```use strict;
use warnings;
sub compare_alleles {
return 1 if grep {\$_ eq '-/-' or \$_ eq 'Z/Z' } @_;
return \$_[0] eq \$_[1] ? 1 : 0;
}
my (@p1, @p2);
my @unequal;
while (<>) {
unless (/^(P?\d)/) {
my @widths;
push @widths, \$+[1] - \$-[1] while /(\S+\s*)/g;
pop @widths;
push @widths, \$widths[-1], \$widths[-1];
print;
next;
}
my @fields = split;
if (\$fields[0] eq 'P1') {
@p1 = @fields;
}
elsif (\$fields[0] eq 'P2') {
@p2 = @fields;
printf 'P1', map (compare_alleles(\$p1[\$_], \$p2[\$_]) ? 'eq' : 'nq', 1..5), 'SUM', '%';
printf 'P2', map('', 1..5), '', '1';
@unequal = grep { not compare_alleles(\$p1[\$_], \$p2[\$_]) } 1..5;
}
else {
my @columns = (\$fields[0], map { \$fields[\$_] eq \$p2[\$_] ? 1 : 0 } 1..5);
my \$sum = 0;
\$sum += \$_ for @columns[@unequal];
my \$percent = \$sum == 0 ? 0 : \$sum * 100 / @unequal;
printf @columns, \$sum, \$percent;
}
}```
this in total what did and what i want for my problem.
Thanking you,
Regards,
Genetist
Attached Files
 sampledata.txt (1.5 KB, 8 views)

 10-09-2013, 02:20 AM #11 pan64 LQ Guru   Registered: Mar 2012 Location: Hungary Distribution: debian/ubuntu/suse ... Posts: 9,438 Rep: no, I suggested to use print ..... (instead of printf \$format ....), so drop that f that is only print.
 10-09-2013, 02:59 AM #12 genetist LQ Newbie   Registered: Jul 2013 Posts: 8 Original Poster Rep: Hi Pan64, Thanks for your reply and i included your suggestions in my script (i kept only print). now my programme printing file path that i given (Actually my programmes asks me to provide filename with path) and it is printing the same thing what i given nothing more than that. anyway thanks for your reply. Regards, Genetist

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is Off HTML code is Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post freshlinux Linux - Newbie 2 09-06-2013 03:40 AM micyew Programming 15 06-29-2012 01:28 PM umix Linux - Newbie 10 10-13-2011 01:26 AM resolute155 Programming 3 09-07-2009 02:29 PM mayyash Linux - General 1 09-30-2005 02:23 AM

LinuxQuestions.org

All times are GMT -5. The time now is 03:13 AM.

 Contact Us - Advertising Info - Rules - LQ Merchandise - Donations - Contributing Member - LQ Sitemap -