way of ignoring out of order lines in diff?
Hi Folks.
I'm working at diffing html pages and mysql db dumps and I'm coming across a (minor) issue. I can use the regular expression engine to ignore certain "words" that i know to have changed (dates, version numbers, hostnames, etc) but (for a reason that is being hunted separatly) occasionally data is reported on the webpage out of order. This behaviour is not a fault and i know that i can do Code:
diff <(sort filenameA ) <(sort filenameB) But im dealing with snapshots consisting of hundreds of pages, and have been recursivly diffing on each directory successfully, solving all comparision problems but this, and leaving me with one file that lists the differences between the directories for each file. i could do something along the lines of Code:
diff <(find dirnameA | xargs sort) <(find dirnameB | xargs sort) After inspecting the man pages i cant find a (obvious) way to ignore out of order lines. Anyone have any bright ideas on either: A) a regex i could use in diff to compare each line to the other lines in the current file. B) a way of post processing the resulting diff file to excise the offending swaps Regards G |
Have you tried sdiff? It tries to put the differences side by side. I often find it more useful than diff though not quite a perfect tool.
|
i thought of that also, and was my initial vector at this issue, but without the regex capabilities of diff, the percieved error rate greatly increases; all of the changes we already knew about get displayed, instead of reducing the number of observed difference to things that we DONT know about.
Thank you for your feedback jl |
All times are GMT -5. The time now is 08:04 AM. |