Quote:
but the output will have 2500 lines (1 for each animal ID). The genotype line is the column of snpnumbers.txt which will change each time I loop through.
|
Sorry I didn't get a chance to come back to this one. You say the output will have 2500 lines but if I read the second part above correctly you would actually have 5000000 ... is this correct?
ie for each unique ID you will have 2000 lines with a different column, represented by data in snp file, plus the same pheontype, hence the data would look something like:
Code:
A00000020 <column 270835 of gene file> 29.4022
A00000022 <column 270835 of gene file> 23.3695
A00000027 <column 270835 of gene file> 6.0783
...
A00000020 <column 270836 of gene file> 29.4022
A00000022 <column 270836 of gene file> 29.4022
A00000027 <column 270836 of gene file> 29.4022
...
A00000020 <column 270837 of gene file> 29.4022
A00000022 <column 270837 of gene file> 29.4022
A00000027 <column 270837 of gene file> 29.4022
...
Would it also be the case that the snp file is always consecutive as it is presently, ie starts at 270834 and goes by up by 1 till the end of the file?
If this is true, you would then only require to read the first line and return the first number of the set.