LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-17-2011, 02:18 PM   #1
confusednewbie
LQ Newbie
 
Registered: Apr 2011
Posts: 1

Rep: Reputation: 0
Question [perl scripting] help with applying array in a specific scenario


Hi Linux forumers,

I have been having this particular problem for a while and hope you can help. I have basic knowledge of arrays in Perl, but I can't adapt the text book examples to my particular situation. Any scripting pointers will help greatly, thanks!

For my situation, for example, I have a reference text file called "File1" that contains the following lines (tab delimited):
Quote:
identiy id_index copynum chr start end confidence
JANE 1 2 1 51598 1500664 212.19
JANE 1 1 1 1617778 1662463 10.0
JANE 1 2 1 1677447 11503149 5427.0
JANE 1 3 1 11509172 11536855 0.29
JANE 1 2 2 11538322 16889665 2840.0
JANE 1 3 2 16890117 16929044 4.51
I have a second text file, "File2" that contains the following lines (tab delimited):

Quote:
probe chr position
CN_466171 1 60500
CN_513370 1 72178
CN_502616 1 76204
CN_511519 1 839258
CN_502615 2 75799
CN_489778 2 11560400
CN_502614 2 16900500
GOAL: What I'm trying to use an array for is to read in the "chr", "start", "end" of each line from File1 and see if any of the lines from File2 ("chr", "position") falls within the range of "start" and "end" given that the "chr" values are equal.

LOGIC: For a line in File1, the "chr", "start", "end" values are read. Every line (corresponding to a "probe") in File2 is read and compared to the read line values from File1: If "chr" value of a probe in File2 matches "chr" value of that line in File1 AND the "position" value of the probe is >="start" and <="end" of that line in File1, then a +1 counter is added.

For each line of File1, this check is done for every line of File2 and the total number of probes (or lines) from File2 satisfying the above condition is counted and printed as a new column in that line; then the process repeats for the next line of File1.

Resulting in something like this (note that the numprobe corresponds to the total number of lines/probes in File2 that matched "chr" values and had its "position" value fall within the "start"/"end" values of the File1 line):
Quote:
identiy id_index copynum chr start end confidence numprobe
JANE 1 2 1 51598 1500664 212.19 4
JANE 1 1 1 1617778 1662463 10.0 0
JANE 1 2 1 1677447 11503149 5427.0 0
JANE 1 3 1 11509172 11536855 0.29 0
JANE 1 2 2 11538322 16889665 2840.0 1
JANE 1 3 2 16890117 16929044 4.51 1
Basically a search and compare of entirety of File2 for each line of File1; if a line of File2 satisfies the given conditions, then add +1 to counter for new "numprobe" column of File1, if not then do not add to counter and move to next line of File2. Process repeats for every line of File1, with counter reset at 0 at the start of each line for File1.

Thank you very much for any scripting pointers. It's frustrating because I know what I need to do in my mind, but it's very difficult to translate those thoughts into actual script.

Last edited by confusednewbie; 04-17-2011 at 02:22 PM. Reason: clarifying some sentences
 
Old 04-17-2011, 04:12 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,337

Rep: Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332Reputation: 1332
The most direct approach from your description is:

1) open files and check for errors

2) Read all of File2 into an array of arrays AoA. The two indices are row and field.

Code:
$row = 0;
while (<FILE2>) {
    $AoA[$row++] = [ split ];
}
3) Read File1 line by line.

Code:
while (<FILE1>) {
    @f1 = split;
4) For each line, scan through all rows of AoA performing your logic.

Code:
    foreach $R (@AoA) {
        if ($f1[3] eq $R->[1])   # compare chr
        etc.
If the files are huge or you need it to be more efficient, then you can optimize some things at the cost of more code. For example, a 3-level array (or a hash) would let you only look through the File2 values for the chr value that you want. Depends on your requirement.
 
Old 04-17-2011, 05:03 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
 
Old 04-18-2011, 01:50 AM   #4
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038

Rep: Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203
Not sure if it has to be perl, but maybe this awk will also give you ideas:
Code:
awk 'FNR==NR{if($2 ~ /[0-9]/)chr[$2]=chr[$2]?chr[$2]" "$3:$3;next}{split(chr[$4],arr);for(x in arr)if(arr[x] >= $5 && arr[x] <= $6)ctr++;print $0,ctr;ctr=0}' file2 file1
 
Old 04-18-2011, 04:36 AM   #5
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,516

Rep: Reputation: 240Reputation: 240Reputation: 240
still need help or finished?
 
Old 04-18-2011, 10:26 AM   #6
Juako
Member
 
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
bash version

plain bash:

Code:
#!/bin/bash
line=0
while read identity id_index copynum chr start end confidence; do
        (( line++ == 0)) && continue
        line=0
        numprobe=0
        while read probe chr2 position; do
                (( line++ == 0)) && continue
                (( chr == chr2 && start <= position && position <= end && numprobe++ ))
        done < file2
        echo "$identity $id_index $copynum $chr $start $end $confidence $numprobe"
done < file1
The three lines of code referencing $line variable are there to skip reading the first line in the input files.

Of course, it can be optimized in various ways (using arrays would be one optimization). This is just to show the algoritm exactly as you stated it.

Last edited by Juako; 04-18-2011 at 01:27 PM. Reason: forgot code block
 
Old 04-18-2011, 12:03 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038

Rep: Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203
Copied your code Juako and as is with the supplied inputs I get all sorts of errors flying up the screen:
Code:
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
./d.sh: line 6: ((: chr: expression recursion level exceeded (error token is "chr")
identiy id_index copynum chr start end confidence 0
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 2 1 51598 1500664 212.19 4
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 1 1 1617778 1662463 10.0 0
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 2 1 1677447 11503149 5427.0 0
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 3 1 11509172 11536855 0.29 0
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 2 2 11538322 16889665 2840.0 1
./d.sh: line 6: ((: position: expression recursion level exceeded (error token is "position")
JANE 1 3 2 16890117 16929044 4.51 1
I do see the correct data hidden amongst it but you might have a little checking to do
 
1 members found this post helpful.
Old 04-18-2011, 12:16 PM   #8
Juako
Member
 
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
Quote:
Originally Posted by grail View Post
Copied your code Juako and as is with the supplied inputs I get all sorts of errors flying up the screen:
that's odd... i run it with the OP's input and it works fine here. Maybe a different bash version? Its running in bash 4, but afaict not using bash 4 specific syntax. Anyway i'll try it in bash 3 and try to see what it is.

*edit* : working fine in another machine with bash 3 too
*edit1* : ahh there i been able to reproduce your error. I had ommited the "header" line in the input files in my tests. When i add the headers the read loops obviously fail trying to make arithmetic comparisons. I'm updating the code and fixing.
*edit2* updated

Last edited by Juako; 04-18-2011 at 01:27 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Applying default permissions for newly created files within a specific folder mattydee Linux - Desktop 29 10-30-2016 09:55 PM
[iptables] - Applying rules to a specific local IP? lew Linux - Networking 1 08-10-2009 01:55 AM
[perl] copying an array element into another array s0l1dsnak3123 Programming 2 05-17-2008 01:47 AM
Bash Shell Scripting - using ls into array aliasofmike Programming 5 11-05-2007 03:00 PM
PERL: Size of an array of an Array inspleak Programming 2 03-10-2004 02:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration