So if I am following the new information, you wish to have an output of chromosome and SNP position BP where the third field in info.txt has the same chromosome and the SNP position BP
are between the gene start and stop positions.
Assuming the above is correct, may I ask which file is the smaller? Reason for this question is that the solution I would present in awk would require reading the first
file into variables to then be checked in the second (and of course storing the smaller file would be quicker and less memory intensive).
To give you an idea based on what you have shown:
Code:
awk 'FNR==NR{low[$3]=$1;high[$3]=$2;next}$4 >= low[$3] && $4 <= high[$3]{print $3,$4}' info.txt 800k_map.txt
A few things to note:
1. Based on supposition at the top means that the current data presented will yield no results with this script
2. I presumed both files have no headers in them (ie just the data)
3. Second file was read first as in 800k_map.txt the third field is currently the same for all values in the fourth field
Please let me know if any of this is unclear or if I went off on the wrong track of what you needed?