LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 12-30-2020, 02:05 PM   #1
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Rep: Reputation: Disabled
Confirming a DIFF bug


Hello,

I think I found a bug in DIFF last version (2.8.7) - I'm using it on Windows 10 (from gnuwin32) and I would like to share it in order to make sure it is indeed a bug and I am not making a mistake

Download "a.txt" from here - https://pastebin.com/MPvv83wi
Download "b.txt" from here - https://pastebin.com/QhzufK6E

Then use this command:
diff a.txt b.txt | grep "14:53"

the result is
< |date=12 July |time=14:53:36
> |date=12 July |time=14:53:36

and this result is wrong. The output should be empty instead.
If you delete the first line in the file "a.txt" (which does not contain the string "14:53"), then the result is correct - nothing in the output

Another way to check the bug is this:

diff a.txt b.txt | grep "<" | cut -c3- > c.txt
diff a.txt c.txt | grep "<" | cut -c3- > d.txt

The file d.txt should be the same with b.txt

but it is not.
However, if you delete the first line in a.txt, the result is correct

The original "a.txt" file was much bigger when I found this bug, I tried to make it as small as possible but I can't make it shorter than it's current size.

I used DIFF on Windows for many years and it worked very well, never had any problem. This is the first time when it acts weird for me.

Can anyone confirm this bug?

Thanks

Last edited by joejobs; 12-30-2020 at 07:13 PM.
 
Old 12-30-2020, 05:38 PM   #2
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by joejobs View Post
Hello,

I think I found a bug in DIFF last version (2.8.7) - I'm using it on Windows (from gnuwin32) and I would like to share it in order to make sure it is indeed a bug and I am not making a mistake


Then use this command:
diff a.txt b.txt | grep "14:53"

the result is
< |date=12 July |time=14:53:36
> |date=12 July |time=14:53:36

and this result is wrong. The output should be empty instead.
If you delete the first line in the file "a.txt" (which does not contain the string "14:53"), then the result is correct - nothing in the output

Another way to check the bug is this:

diff a.txt b.txt | grep "<" | cut -c3- > c.txt
diff a.txt c.txt | grep "<" | cut -c3- > d.txt

The file d.txt should be the same with b.txt

but it is not.
However, if you delete the first line in a.txt, the result is correct
--

Can anyone confirm this bug?

Thanks
I tried exactly what you said but my result is empty.
Just so we are clear, you are reporting a result from the same tool on a different OS.

The diff command compares, line for line, the contents of one file to the other. It summarizes groups of lines compared then tells you which ones should be removed or added in that group to make the files match (originally intended to provide patches for source files but has many other uses). If the files match, the output is empty. If they do not match it gives the line that first does not match then what needs done until the next match. (< means remove from the first file, > means add to the first file)

Thus your statement that removing one line in a very long file (a.txt) you change the output is misleading. In order to do a true comparison test we would need the exact copy of the entire first file with no changes, the exact copy of the second file with no changes, and the full step by step description in detail of what was done.

You do not provide enough information to even attempt to verify your results. Changing even one line in either file invalidates what someone else might find.
 
Old 12-30-2020, 07:13 PM   #3
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
Thanks for trying it (on linux I guess)

I did not say removing one (any) line. I said removing the first line

I removed the first line in the first file and then diff worked correctly.

Any chance you can try DIFF on Windows?

I am using Windows 10
 
Old 12-30-2020, 07:49 PM   #4
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
If the files are otherwise identical, maybe their encoding is different?
 
Old 12-30-2020, 08:05 PM   #5
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by joejobs View Post
Thanks for trying it (on linux I guess)

I did not say removing one (any) line. I said removing the first line

I removed the first line in the first file and then diff worked correctly.

Any chance you can try DIFF on Windows?

I am using Windows 10
Adding, removing, or changing one line (anywhere in one of the files) will change the output of diff.
It starts at line 1 in file 1 and compares it to the same line in file 2
prints output if needed
then steps to line 2 and repeats.
As long as each line matches there should be no output (default) but with a difference it displays what is different and tells you the changes needed to make them match.

I would guess that a.txt had an extra line at the beginning and that your grep hid that from you. Probably every line had the extract and add pairs. As I said, there is NO way to find out what actually happened without having a copy of each of the original unchanged files to test with.

99% of my use is Linux and the rest is android.

I would be happy to test and verify things for you but only with the original files, which you could send to me via dropbox or similar.

Just to update you on where diff originated:
Diff was the file comparator to show the changes between the original source code and and an updated version.
Patch was the tool to process the diff output and update code at a different location to apply the updates
Both these tools have existed longer than I have been working with computers (>40 years) and made it easy to send 10 or 100 lines of code change to update a program containing thousands of lines of code and are still used in many places today for the same purpose. I used both diff and patch in the early days of linux to patch updates into programs (even kernels) that were still being compiled by the average user in those days.

Last edited by computersavvy; 12-30-2020 at 08:08 PM.
 
Old 12-30-2020, 09:55 PM   #6
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
I've uploaded the files here:
https://www.transfernow.net/files/?u...e=2PQW8N122020

The file "a.txt" is UTF-8 without signature
The file "b.txt" only contains ASCII characters so when I try to convert into UTF-8 without signature, it stays ANSI

Later edit:
I converted both files to UTF-8 with signature and now it seems to be working

Last edited by joejobs; 12-31-2020 at 08:30 AM.
 
Old 12-31-2020, 09:11 AM   #7
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
I found another pair of files that doesn't work well using DIFF
They are both encoded in UTF-8 with signature and using Windows style endlines - CR LF

I am just trying to remove the lines of the second file (small text file) from the first text file

Unfortunately, DIFF decides that some certain lines do not exist in the second file even though in reality they actually exist in the second file

I have uploaded the files here:

https://www.4shared.com/zip/1vAH8vu8ea/DIFF2.html

For example this line exists in both files:

Quote:
|date = 19 August |time = 05:29
and then, when using this command:

Quote:
diff a b | grep "05:29"
the result should be empty

but instead the result is this one:
Quote:
< |date = 19 August |time = 05:29
> |date = 19 August |time = 05:29
This is a real life situation where I try using DIFF to solve a real task

Last edited by joejobs; 12-31-2020 at 09:12 AM.
 
Old 12-31-2020, 09:25 AM   #8
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by joejobs View Post
I found another pair of files that doesn't work well using DIFF
They are both encoded in UTF-8 with signature and using Windows style endlines - CR LF

I am just trying to remove the lines of the second file (small text file) from the first text file

Unfortunately, DIFF decides that some certain lines do not exist in the second file even though in reality they actually exist in the second file

I have uploaded the files here:

https://www.4shared.com/zip/1vAH8vu8ea/DIFF2.html

For example this line exists in both files:


and then, when using this command:


the result should be empty

but instead the result is this one:


This is a real life situation where I try using DIFF to solve a real task
It is also a problem of what Windows considers to be a line. And that is not the same thing and easily makes the result problematical. For instance, Windows has "Byte Order Mark (BOM)" at the beginning of some files... Hence deleting a line might easily change what is seen. One variation is that the last line might not have a line terminator...

Windows has a number of "standard text formats" that can vary depending on how the file was created.

https://en.wikipedia.org/wiki/Text_file
 
Old 12-31-2020, 10:01 AM   #9
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by jpollard View Post
It is also a problem of what Windows considers to be a line. And that is not the same thing and easily makes the result problematical. For instance, Windows has "Byte Order Mark (BOM)" at the beginning of some files... Hence deleting a line might easily change what is seen. One variation is that the last line might not have a line terminator...

Windows has a number of "standard text formats" that can vary depending on how the file was created.

https://en.wikipedia.org/wiki/Text_file
If I install MSYS2 (linux enironment) on Windows and use diff there, does it solve my problem?
Or maybe the text files handling will still be affected by the Windows OS?
 
Old 12-31-2020, 10:15 AM   #10
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by joejobs View Post
If I install MSYS2 (linux enironment) on Windows and use diff there, does it solve my problem?
Or maybe the text files handling will still be affected by the Windows OS?
That I don't know. It might fix things.

In the past I've just had problems taking files from Windows and using them - I've always had to do some form of conversion as I've caught Windows using invalid characters in text files... things that end with odd numbers/multi-byte characters when it isn't supposed to be doing so (one had something like a 226 hex value when it was supposed to be an apostrophe).
 
Old 12-31-2020, 11:49 AM   #11
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Here is the issue with those 2 files not matching on the first set you uploaded.
Code:
$ diff a b
1,924d0
< |outcome=Operational
< }}{{TLS-PL
< |name={{flagicon|JPN}}[[CUTE-1.7|Cute-1.7+APD]]
< |user=[[Tokyo Institute of Technology|TiTech]]
< |orbit=[[Low Earth orbit|Low Earth]]
< |function=Amateur radio
< |outcome=Operational
Nothing in the first 1,924 lines of a matches the first line in b.
Code:
$  head a
|outcome=Operational
}}{{TLS-PL
|name={{flagicon|JPN}}[[CUTE-1.7|Cute-1.7+APD]]
|user=[[Tokyo Institute of Technology|TiTech]]
|orbit=[[Low Earth orbit|Low Earth]]
|function=Amateur radio
|outcome=Operational
}}
}}
{{TLS-RL|NoPL=1

$  head b
{{TLS-RL|NoPL=1
|date=12 July |time=14:53:36
|rocket={{flagicon|UKR}}[[Dnepr (rocket)|Dnepr]]
|site={{flagicon|RUS}}[[Dombarovsky (air base)|Dombarovskiy]]
|LSP={{flagicon|RUS}}[[ISC Kosmotras]]
|remarks=First [[Uncrewed spacecraft|uncrewed]] prototype of a commercial space station module
|payload={{TLS-PL
|name={{flagicon|USA}}[[Genesis I]]
|user=[[Bigelow Aerospace|Bigelow]]
|orbit=Low Earth
In the first pair of files you uploaded a contains 1930 lines of which the last 14 match the content of b.
You said:
Quote:
I found another pair of files that doesn't work well using DIFF
They are both encoded in UTF-8 with signature and using Windows style endlines - CR LF

I am just trying to remove the lines of the second file (small text file) from the first text file
Diff is far from the best tool for removing a few lines of text. It does tell you by line count where they match and then you can manually go to that point with a text editor and remove the desired lines but it would be far easier with other streaming editors that could read each file, line for line, and remove what you want.
It would be fairly simple to write a visual basic prog to read and compare the files and extract the part desired. A linux script would be even easier.

If you insist on using diff then the option -y will give you output in 2 columns so you can easily identify the lines that match. Using grep on the output will only give you a tiny bit of the detail and not enough to even see the useful info it provides.

Last edited by computersavvy; 12-31-2020 at 12:01 PM.
 
Old 12-31-2020, 03:07 PM   #12
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Thinking about this a bit more and what you want to accomplish, which you stated was to remove the content of the smaller file (b) from (a) you can use diff in a bit different way.

first switch the order of the diff command
"$ diff b a > diff.txt "
as that will give you a diff.txt file that contains what exists in a that does not exist in b.

Now create a new file c that contains only the lines from a that were not in b.
"$ patch c diff.txt"

Now if you still need to keep the original file a with the lines removed simply rename c to a
"$ mv c a "

I tested it on the first files you posted and it works perfectly. If you environment contains diff then it also should contain patch.
 
Old 01-02-2021, 08:26 AM   #13
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
when I use "patch c diff.txt"
I get this error:

patch: **** Can't find file c : No such file or directory

(I used "diff b a > diff.txt" before that, so file diff.txt exists)

Last edited by joejobs; 01-02-2021 at 08:27 AM.
 
Old 01-02-2021, 08:40 AM   #14
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by joejobs View Post
when I use "patch c diff.txt"
I get this error:

patch: **** Can't find file c : No such file or directory

(I used "diff b a > diff.txt" before that, so file diff.txt exists)
It said the file c didn't exist... not "diff.txt".
 
Old 01-02-2021, 08:41 AM   #15
joejobs
LQ Newbie
 
Registered: Dec 2020
Posts: 12

Original Poster
Rep: Reputation: Disabled
Cool

Quote:
Originally Posted by computersavvy View Post
Here is the issue with those 2 files not matching on the first set you uploaded.
Code:
$ diff a b
1,924d0
< |outcome=Operational
< }}{{TLS-PL
< |name={{flagicon|JPN}}[[CUTE-1.7|Cute-1.7+APD]]
< |user=[[Tokyo Institute of Technology|TiTech]]
< |orbit=[[Low Earth orbit|Low Earth]]
< |function=Amateur radio
< |outcome=Operational
Nothing in the first 1,924 lines of a matches the first line in b.
Sorry that is not true.
The first line in b is "{{TLS-RL|NoPL=1"
And many lines in a matches it:
Lines 10, 25, 44, 105, 120, 135

Quote:
In the first pair of files you uploaded a contains 1930 lines of which the last 14 match the content of b.
Sorry but lines 910 - 923 contain exactly the all 14 lines in file b.

Now, if you delete first line in file a (which is not contained in b), the diff works correctly.
And you can use it to delete all lines in b from file a

diff a b | grep "<" | cut -c3- > c
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
confirming if dd comand is working dambert Linux - Newbie 11 02-01-2007 09:27 AM
Confirming a secure custom Chroot jail mikeyt_333 Linux - Security 7 04-13-2006 11:42 AM
Need comferting / confirming :( Slith(++1) Linux - Security 8 09-08-2005 08:29 AM
Confirming and testing Nvidia drivers are installed LinuxBAH Linux - General 5 06-30-2003 12:53 AM
Confirming Debian Version dunkyb Linux - General 2 02-22-2003 09:33 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 06:52 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration