How to Compare the c/c++ codes

muneebs123 · 02-18-2004, 10:58 PM

Hi
I am a teacher for linux programming, and i was wondering if there is some script or program that is available that compares different code files and then tells me the percentage of similarity. The script should be a batch one able to search and compare more than 200 files so that i know which students have copied from each other.

trickykid · 02-18-2004, 11:33 PM

Moved: More suitable in our Programming forum. Regards.

bdp · 02-18-2004, 11:43 PM

Hi muneebs123,

I am unsure if a proper linux script already exists, but would suggest quantifying the similarity between any two codes with an autocorrelation function. Essentially define an initial window of say 100 bytes and another window that slides down the code, 100 bytes at a time, moving 50 bytes per movement to satisfy the Nyquist sampling criterion. Autocorrelate the initial and sliding window once per 50 byte movement and consider a 'hit' (i.e. copied code) to be some high degree of correlation, perhaps 0.95. Of course the initial window will also need to slide down the data to compare all possible initial windows with all possible final windows.

This method is routinely used in many data analysis applications.

If you are interested in applying this approach I can provide more details.

Cheers, - Brian

muneebs123 · 02-19-2004, 12:05 AM

yes i will be interested. Please provide some details
thanks

bdp · 02-19-2004, 12:53 AM

Hi muneebs123,

This site gives a nice intro to autocorrelation:
http://astronomy.swin.edu.au/~pbourk...sis/correlate/

how about taking my uppermost reply and the link above and asking the students to write the code

i'd suggest using FFTW (google can explain quickly) to implement fourier transforms if you choose to use frequency space.

cheers, - brian