Hi muneebs123,
I am unsure if a proper linux script already exists, but would suggest quantifying the similarity between any two codes with an autocorrelation function. Essentially define an initial window of say 100 bytes and another window that slides down the code, 100 bytes at a time, moving 50 bytes per movement to satisfy the Nyquist sampling criterion. Autocorrelate the initial and sliding window once per 50 byte movement and consider a 'hit' (i.e. copied code) to be some high degree of correlation, perhaps 0.95. Of course the initial window will also need to slide down the data to compare all possible initial windows with all possible final windows.
This method is routinely used in many data analysis applications.
If you are interested in applying this approach I can provide more details.
Cheers, - Brian
|