Thanks for the replies guys. No this is not a homework
The original files were generated by counting hits that fall within a particular numbered region. If there were no hits in that region, there were not counts/score.
I'm not sure how I can modify my original script. Maybe you can help?
Basically I first start off by extracting lines that fall within a particular region of a chromosome
Code:
grep -w chr10 nelf-ctl.bowtie | gawk '$4>102104816 && $4<102126247'> scd.nelf-ctl
Then I cut out a column and count and sort the hits:
Code:
cut -f4 scd.nelf-ctl | gawk '{print int(($1-102104816)/10)}'| sort | uniq -c | sort -k2,2n > scd.nelf-ctl.10bp-bin.counts
Which output is as I first described in the original post.
I have tried to create a sequentially numbered file and to join the counts file with the sequentially numbered file but it doesn't always work. Sometimes it joins up to line 100, sometimes line 90, sometimes skips line 100-999.
Code:
gawk 'BEGIN {for (i=0; i<=( 21431/10); i++) print i}' > scd-10bp.allbins
join -1 1 -2 2 -a 1 scd-10bp.allbins scd.nelf-ctl.10bp-bin.counts > scd.nelf-ctl.10bp-bin.allbins
Sorry for this long post.
Thank you for your help.
Julian