Hi all,
I a large dataset of paired files representing amino acid sequences for a particular gene from different organisms. One file in each pair is a modified version of the other in which certain parts have been trimmed out. I need to manually evaluate the results of this trimming and would like to make figures out of some of the files as examples.
Is there a program like diff that could take two files and produce something like a pdf output in which parts of the file that differ are somehow highlighted?
Here's an example of an input file pair (the sequences themselves do not have any line breaks in them):
Code:
>1
DDFKIAVCSSNQNRSMEAHSFLSKKGFNVKSFGTGNMVKLPGPAPDKPNVYDFSITYDAMYRDLMQKDYELYTQNGILHMLDRNRRIKAHPERFQDSTERFDLLITCEERVYDQVLEDFENKEKQAVHIINIDIQDDHEEATIGAFMVCELVTMLYASDDLDNEVDEILQEFEHKVKRSVLHTAQ
>2
HEFRIAVCSSNQNRSMEAHSFLSKKGFCVKSFGTGNMVKLPGPAPDKPNIYDFSITYDAMYRDLXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXQAVHIINIDIQDNHEEATIGAFMICELVTMLFASEDLDNEIDEILQEFEHKVNRPVLHTVQ
>3
EILRIAVCSSNQNRSMEAHNFLSKRGFNVKSFGSGTHVKLPGPSPDRPNIYDFTTTYDQMYRDLIEKDKNLYTQNGLLHMLDRNRRIKEKPQRFQSCPDHFDLIITCEERVYDQVVEDLENRDNESCHIINIDIQDNHEEATIGAFMICDLVAMLAKCEDLDNEVDEMVQEFEGQCQDPCCTRLF
Code:
>1
DDFKIAVCSSNQNRSMEAHSFLSKKGFNVKSFGTGNMVKLPGPAPDKPNVYDFSITYDAMYRDLMQKDYELYTQNGILHMLDRNRRIKAHPERFQDSTERFDLLITCEERVYDQVLEDFENKEKQAVHIINIDIQDDHEEATIGAFMVCELVTMLYASDDLDNEVDEILQEFEHKVKRSVLHTAQ
>2
HEFRIAVCSSNQNRSMEAHSFLSKKGFCVKSFGTGNMVKLPGPAPDKPNIYDFSITYDAMYRDL QAVHIINIDIQDNHEEATIGAFMICELVTMLFASEDLDNEIDEILQEFEHKVNRPVLHTVQ
>3
EILRIAVCSSNQNRSMEAHNFLSKRGFNVKSFGSGTHVKLPGPSPDRPNIYDFTTTYDQMYRDLIEKDKNLYTQNGLLHMLDRNRRIKEKPQRFQSCPDHFDLIITCEERVYDQVVEDLENR HIINIDIQDNHEEATIGAFMICDLVAMLAKCEDLDNEVDEMVQEFE
Any suggestions on how I could do this would be greatly appreciated!!!
Best,
Kevin