I'm working at diffing html pages and mysql db dumps and I'm coming across a (minor) issue.
I can use the regular expression engine to ignore certain "words" that i know to have changed (dates, version numbers, hostnames, etc) but (for a reason that is being hunted separatly) occasionally data is reported on the webpage out of order.
This behaviour is not a fault and i know that i can do
diff <(sort filenameA ) <(sort filenameB)
And that solves the issue with regards to one file.
But im dealing with snapshots consisting of hundreds of pages, and have been recursivly diffing on each directory successfully, solving all comparision problems but this, and leaving me with one file that lists the differences between the directories for each file.
do something along the lines of
diff <(find dirnameA | xargs sort) <(find dirnameB | xargs sort)
But then to diff this just looks like one long file and i lose the delineation between files.
After inspecting the man pages i cant find a (obvious) way to ignore out of order lines.
Anyone have any bright ideas on either:
A) a regex i could use in diff to compare each line to the other lines in the current file.
B) a way of post processing the resulting diff file to excise the offending swaps