Confirming a DIFF bug
Hello,
I think I found a bug in DIFF last version (2.8.7) - I'm using it on Windows 10 (from gnuwin32) and I would like to share it in order to make sure it is indeed a bug and I am not making a mistake Download "a.txt" from here - https://pastebin.com/MPvv83wi Download "b.txt" from here - https://pastebin.com/QhzufK6E Then use this command: diff a.txt b.txt | grep "14:53" the result is < |date=12 July |time=14:53:36 > |date=12 July |time=14:53:36 and this result is wrong. The output should be empty instead. If you delete the first line in the file "a.txt" (which does not contain the string "14:53"), then the result is correct - nothing in the output Another way to check the bug is this: diff a.txt b.txt | grep "<" | cut -c3- > c.txt diff a.txt c.txt | grep "<" | cut -c3- > d.txt The file d.txt should be the same with b.txt but it is not. However, if you delete the first line in a.txt, the result is correct The original "a.txt" file was much bigger when I found this bug, I tried to make it as small as possible but I can't make it shorter than it's current size. I used DIFF on Windows for many years and it worked very well, never had any problem. This is the first time when it acts weird for me. Can anyone confirm this bug? Thanks |
Quote:
Just so we are clear, you are reporting a result from the same tool on a different OS. The diff command compares, line for line, the contents of one file to the other. It summarizes groups of lines compared then tells you which ones should be removed or added in that group to make the files match (originally intended to provide patches for source files but has many other uses). If the files match, the output is empty. If they do not match it gives the line that first does not match then what needs done until the next match. (< means remove from the first file, > means add to the first file) Thus your statement that removing one line in a very long file (a.txt) you change the output is misleading. In order to do a true comparison test we would need the exact copy of the entire first file with no changes, the exact copy of the second file with no changes, and the full step by step description in detail of what was done. You do not provide enough information to even attempt to verify your results. Changing even one line in either file invalidates what someone else might find. |
Thanks for trying it (on linux I guess)
I did not say removing one (any) line. I said removing the first line I removed the first line in the first file and then diff worked correctly. Any chance you can try DIFF on Windows? I am using Windows 10 |
If the files are otherwise identical, maybe their encoding is different?
|
Quote:
It starts at line 1 in file 1 and compares it to the same line in file 2 prints output if needed then steps to line 2 and repeats. As long as each line matches there should be no output (default) but with a difference it displays what is different and tells you the changes needed to make them match. I would guess that a.txt had an extra line at the beginning and that your grep hid that from you. Probably every line had the extract and add pairs. As I said, there is NO way to find out what actually happened without having a copy of each of the original unchanged files to test with. 99% of my use is Linux and the rest is android. I would be happy to test and verify things for you but only with the original files, which you could send to me via dropbox or similar. Just to update you on where diff originated: Diff was the file comparator to show the changes between the original source code and and an updated version. Patch was the tool to process the diff output and update code at a different location to apply the updates Both these tools have existed longer than I have been working with computers (>40 years) and made it easy to send 10 or 100 lines of code change to update a program containing thousands of lines of code and are still used in many places today for the same purpose. I used both diff and patch in the early days of linux to patch updates into programs (even kernels) that were still being compiled by the average user in those days. |
I've uploaded the files here:
https://www.transfernow.net/files/?u...e=2PQW8N122020 The file "a.txt" is UTF-8 without signature The file "b.txt" only contains ASCII characters so when I try to convert into UTF-8 without signature, it stays ANSI Later edit: I converted both files to UTF-8 with signature and now it seems to be working |
I found another pair of files that doesn't work well using DIFF
They are both encoded in UTF-8 with signature and using Windows style endlines - CR LF I am just trying to remove the lines of the second file (small text file) from the first text file Unfortunately, DIFF decides that some certain lines do not exist in the second file even though in reality they actually exist in the second file I have uploaded the files here: https://www.4shared.com/zip/1vAH8vu8ea/DIFF2.html For example this line exists in both files: Quote:
Quote:
but instead the result is this one: Quote:
|
Quote:
Windows has a number of "standard text formats" that can vary depending on how the file was created. https://en.wikipedia.org/wiki/Text_file |
Quote:
Or maybe the text files handling will still be affected by the Windows OS? |
Quote:
In the past I've just had problems taking files from Windows and using them - I've always had to do some form of conversion as I've caught Windows using invalid characters in text files... things that end with odd numbers/multi-byte characters when it isn't supposed to be doing so (one had something like a 226 hex value when it was supposed to be an apostrophe). |
Here is the issue with those 2 files not matching on the first set you uploaded.
Code:
$ diff a b Code:
$ head a You said: Quote:
It would be fairly simple to write a visual basic prog to read and compare the files and extract the part desired. A linux script would be even easier. If you insist on using diff then the option -y will give you output in 2 columns so you can easily identify the lines that match. Using grep on the output will only give you a tiny bit of the detail and not enough to even see the useful info it provides. |
Thinking about this a bit more and what you want to accomplish, which you stated was to remove the content of the smaller file (b) from (a) you can use diff in a bit different way.
first switch the order of the diff command "$ diff b a > diff.txt " as that will give you a diff.txt file that contains what exists in a that does not exist in b. Now create a new file c that contains only the lines from a that were not in b. "$ patch c diff.txt" Now if you still need to keep the original file a with the lines removed simply rename c to a "$ mv c a " I tested it on the first files you posted and it works perfectly. If you environment contains diff then it also should contain patch. |
when I use "patch c diff.txt"
I get this error: patch: **** Can't find file c : No such file or directory (I used "diff b a > diff.txt" before that, so file diff.txt exists) |
Quote:
|
Quote:
The first line in b is "{{TLS-RL|NoPL=1" And many lines in a matches it: Lines 10, 25, 44, 105, 120, 135 Quote:
Now, if you delete first line in file a (which is not contained in b), the diff works correctly. And you can use it to delete all lines in b from file a diff a b | grep "<" | cut -c3- > c |
All times are GMT -5. The time now is 02:21 AM. |