Hi Linux forumers,
I have been having this particular problem for a while and hope you can help. I have basic knowledge of arrays in Perl, but I can't adapt the text book examples to my particular situation. Any scripting pointers will help greatly, thanks!
For my situation, for example, I have a reference text file called "File1" that contains the following lines (tab delimited):
Quote:
identiy id_index copynum chr start end confidence
JANE 1 2 1 51598 1500664 212.19
JANE 1 1 1 1617778 1662463 10.0
JANE 1 2 1 1677447 11503149 5427.0
JANE 1 3 1 11509172 11536855 0.29
JANE 1 2 2 11538322 16889665 2840.0
JANE 1 3 2 16890117 16929044 4.51
|
I have a second text file, "File2" that contains the following lines (tab delimited):
Quote:
probe chr position
CN_466171 1 60500
CN_513370 1 72178
CN_502616 1 76204
CN_511519 1 839258
CN_502615 2 75799
CN_489778 2 11560400
CN_502614 2 16900500
|
GOAL: What I'm trying to use an array for is to read in the "chr", "start", "end" of each line from File1 and see if any of the lines from File2 ("chr", "position") falls within the range of "start" and "end" given that the "chr" values are equal.
LOGIC: For a line in File1, the "chr", "start", "end" values are read. Every line (corresponding to a "probe") in File2 is read and compared to the read line values from File1: If "chr" value of a probe in File2 matches "chr" value of that line in File1 AND the "position" value of the probe is >="start" and <="end" of that line in File1, then a +1 counter is added.
For each line of File1, this check is done for every line of File2 and the total number of probes (or lines) from File2 satisfying the above condition is counted and printed as a new column in that line; then the process repeats for the next line of File1.
Resulting in something like this (note that the numprobe corresponds to the total number of lines/probes in File2 that matched "chr" values and had its "position" value fall within the "start"/"end" values of the File1 line):
Quote:
identiy id_index copynum chr start end confidence numprobe
JANE 1 2 1 51598 1500664 212.19 4
JANE 1 1 1 1617778 1662463 10.0 0
JANE 1 2 1 1677447 11503149 5427.0 0
JANE 1 3 1 11509172 11536855 0.29 0
JANE 1 2 2 11538322 16889665 2840.0 1
JANE 1 3 2 16890117 16929044 4.51 1
|
Basically a search and compare of entirety of File2 for each line of File1; if a line of File2 satisfies the given conditions, then add +1 to counter for new "numprobe" column of File1, if not then do not add to counter and move to next line of File2. Process repeats for every line of File1, with counter reset at 0 at the start of each line for File1.
Thank you very much for any scripting pointers. It's frustrating because I know what I need to do in my mind, but it's very difficult to translate those thoughts into actual script.