LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-12-2022, 06:33 AM   #1
dalacor
Member
 
Registered: Feb 2019
Distribution: Slackware
Posts: 173

Rep: Reputation: Disabled
Compare two files and output differences to new file


I am not very familiar with bash coding and I managed to create this coding by looking at different websites and forums. But I still can't get this to work.

What I am trying to do is compare a list of domains in each category in an old blacklist/whitelist and a list of domains in a new blacklist/whitelist. The aim is to delete the old blacklist and add any missing domains in the old blacklist to the new blacklist.

I have two files - comparedomains.sh and domain_mapping.txt

In domain_mapping.txt I have a path which points the script to the location of the old blacklist file called domains and the location of the new blacklist file also called domains. The idea is that I should be able to include every category as listed below in that file. I have only added cleaning category for testing purposes:

Code:
old/cleaning/domains;new/cleaning/domains
In the comparedomains.sh script I have the following:

Code:
#!/bin/bash

# Create a file which has the mappings of which two files to compare
# e.g. in domain_mapping.txt
# old/cleaning/domains;new/cleaning/domains
mapping_file="/home/domain_mapping.txt"

# missing_list will be the domains that are missing from new domain but are in old domain file.
# This file will be created in each directory, e.g. new/cleaning/missing_list

while read -r mapping;
do
  echo "Is this even mapping"
  new_file="${mapping#*,}"
  missing_list=$(echo $new_file | sed 's/domains/missing/g')
  old_file="${mapping%,*}"
  echo "Comparing ${old_file} and ${new_file}"

  # Initialise files
  rm $missing_list 1>/dev/null 2>&1

  while read -r website;
  echo "Is this even website"
  do
    if [[ ! $(grep $website $new_file) ]]; then
      echo $website >> $missing_list
    fi
  done < $old_file

  echo ""
  echo "Completed ${old_file}"
  echo ""
  
done < $mapping_file

echo "Completed all mapping file comparisons"

exit 0
The script should read the domain_mapping.txt file, get the old domain file as being old/cleaning/domains, getthe new domain file as new/cleaning/domains, compare the differences between those two files and then create a missing_list in new/cleaning folder for any domains that are in the old domain but not new domain. The order of the domain listings is irrelevant. I just want to know that domains abc are not in the new domain file - I don't care what line the domains are listed in both filess

However, the script never seems to run anything between "while read" and "done" I added in echo "Is this even mapping" and echo "Is this even website" and this never shows in the command output. All I ever see output is "Completed all mapping file comparisons". Everything between "while" and "done" seems to be ignored.

I think it may have something to do with this line - while read -r mapping; My understanding is that the name can be anything so I called it mapping. But maybe I am not understanding how the while read -r command works. But maybe the problem is something else? I can't see what the problem is, but it would appear from my test echos (mapping) and (website) that the while read section is not even running.
 
Old 07-12-2022, 06:42 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,047

Rep: Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349
did you try the tool named diff? (also see man diff about possible options)
 
Old 07-12-2022, 07:00 AM   #3
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,629

Rep: Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557

Should definitely be using existing diff tool instead of attempting a buggy recreation of diff.

 
Old 07-12-2022, 07:10 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Don't be too harsh, we all gotta learn what we need to ask.
 
Old 07-12-2022, 08:36 AM   #5
teckk
LQ Guru
 
Registered: Oct 2004
Distribution: Arch
Posts: 5,152
Blog Entries: 6

Rep: Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835Reputation: 1835
Quote:
The aim is to delete the old blacklist and add any missing domains in the old blacklist to the new blacklist.
Simple examples:
Code:
#1 List of IP4
a="
162.142.125.229
162.142.125.230
167.94.138.130
167.94.138.131
167.94.138.132
167.94.138.147
167.94.145.19
167.94.145.20
167.248.133.146
170.106.174.246
182.247.142.166
185.94.111.1
185.102.170.174
185.156.74.26
185.180.143.148
185.244.151.51
186.222.211.130
186.224.33.11
192.241.216.58
192.241.220.57
192.241.220.186
193.201.9.43
"

#2 List of IP4
b="
161.142.125.229
162.142.125.230
166.94.138.130
167.94.138.131
167.94.138.132
167.94.138.147
166.94.145.19
167.94.145.20
167.248.133.146
170.106.174.246
182.247.142.166
184.94.111.1
185.102.170.174
185.156.74.26
185.180.143.148
185.244.151.51
186.222.211.130
186.224.33.11
193.241.216.58
192.241.220.57
192.241.220.186
193.201.9.43
"

#In both example
comm --nocheck-order -12 <(echo "$a") <(echo "$b")

#Difference example
diff -c <(echo "$b") <(echo "$a")

#Another
diff -u <(echo "$b") <(echo "$a")

#Grep
grep -Fxvf <(echo "$a") <(echo "$b")
See:
man comm
man diff
man grep
 
Old 07-12-2022, 09:24 AM   #6
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,789

Rep: Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951
There is nothing obvious why the script would not read the domain_mapping.txt file.
You can add set -x to see how the script executes.

Code:
#!/bin/bash
set -x 
...
One obvious error is your using a "," instead of a ';' in the lines
Code:
new_file="${mapping#*,}"
old_file="${mapping%,*}"
You can also use https://www.shellcheck.net/ to check the syntax of your script.
 
Old 07-12-2022, 11:12 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,629

Rep: Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557
Quote:
Originally Posted by syg00 View Post
Don't be too harsh, we all gotta learn what we need to ask.
It was not intended as such, merely to reinforce what Pan was saying - though it was a bit of a hasty response, and on reflection I don't agree diff is the optimal tool.

To simply get all domains from both files:
Code:
sort -u old.txt new.txt
Or to get the lines in old.txt that are absent from new.txt:
Code:
comm -23 <(sort old.txt) <(sort new.txt)
The first of those essentially gives the content for the new new file, whilst the second is useful if there's a reason to segment the old file data in some fashion.

 
Old 07-12-2022, 11:40 AM   #8
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,256

Rep: Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338Reputation: 5338
I’d do this in Python, honestly.
 
Old 07-13-2022, 09:56 AM   #9
dalacor
Member
 
Registered: Feb 2019
Distribution: Slackware
Posts: 173

Original Poster
Rep: Reputation: Disabled
Sorry for the late reply. Was busy with something else yesterday, which took longer than expected.

I have managed to get it working Yay, so I won't worry about starting all over again using diff. However I have absolutely no idea why I am experiencing this issue.

I copy the comparedomains.sh file and domain_mapping.txt file from my windows computer to my Linux virtual machine (Slackware) along with the domain files that I wish to compare.

On the Linux machine, I then chmod 755 comparedomains.sh to make the script executable. When I run the script, (using the set -x option suggested by MichaelK), I get the following output:

mapping_file="domain_mapping.txt"
read -r mapping
echo "Completed all mapping file comparisons)

Now if I edit the domain_mapping.txt (in Linux - say remove a letter and add the letter back again) and save the file and then run the script - everything works! For some reason, when you copy that text file from Windows to Linux, the script obviously can't open/read the file. Editing that txt file in Linux and saving it makes the file open/readable. I discovered this by accident as I had domain instead of domains as the name of one of the blacklist files. However that typo was not the cause of the script failing. I tested this by putting in correct and incorrect paths. (Once you have edited that file in Linux), the script reports an error file or directory not found if the path is wrong.

I tested whether it makes any difference if the file in question has a .txt extension. Makes no difference.

I will give best answer to MichaelK as that put me on the right track with the suggestion of using set -x. For the record, I did pick up the error regarding "," versus ";" but that made no difference as the file wasn't being read obviously.

My script now works perfectly, but if anyone can explain why I need to open the domain_mapping file on Linux, make a change and save it before the comparedomains.sh script can open that file that would be fantastic. I have no problems with scripts reading the domains files (which were also copied from Windows and were not edited on Linux), so I cannot see why this particular file is so special?

Last edited by dalacor; 07-13-2022 at 09:59 AM.
 
Old 07-13-2022, 10:14 AM   #10
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,789

Rep: Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951
It appears like a end of line character problem, Windows text files and linux text files use different end of line characters. linux uses just the lf whereas Windows uses cr,lf.

There are many ways to convert from Windows/DOS to linux end of line characters. One example from the command line using tr is:

tr -d '\r' < domain_mapping.txt > converted_mapping.txt

Be sure to change your bash script to use the converted file if you transfer the file again.

Last edited by michaelk; 07-13-2022 at 10:19 AM.
 
Old 07-13-2022, 10:27 AM   #11
dalacor
Member
 
Registered: Feb 2019
Distribution: Slackware
Posts: 173

Original Poster
Rep: Reputation: Disabled
Would that explain why the domains files work? Presumably those files would have been created on a Linux machine at some point even though I have edited them on Windows (or at least edited a couple of them). The domain files were originally downloaded from the Internet on my windows computer.

I presume that this problem only occurs for text files. Pretty much every .sh file that I have run on Linux was probably created on Windows and then copied across and chmod 755 to make it run.

Thank you, I will have a look into this end of line issue as I have never encountered it before. Surprising that Slackware doesn't report that it can't open the file or something.
 
Old 07-13-2022, 10:36 AM   #12
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,629

Rep: Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557Reputation: 2557
Quote:
Originally Posted by dalacor View Post
I have managed to get it working Yay, so I won't worry about starting all over again using diff.

...

My script now works perfectly
Perfectly is a bold claim. Does it really behave as desired for all possible input variations? Or did you do what a lot of programmers do and test only a single use case?

(For example, did you fix the bugs caused by unquoted filename variables present in the original script?)


Last edited by boughtonp; 07-13-2022 at 10:37 AM.
 
Old 07-13-2022, 10:39 AM   #13
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,789

Rep: Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951Reputation: 5951
Yes, it is just a text file problem. It isn't that it can not open the file but just how the bash builtin read command "reads" lines in a text file.

I have not looked but newer bash versions might ignore the CR when running a script but the read command does not.
 
Old 07-13-2022, 10:41 AM   #14
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,047

Rep: Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349Reputation: 7349
As far as I see:
Code:
while read -r mapping;
do
  echo "Is this even mapping"
  new_file="${mapping#*,}"
...
  while read -r website;
  echo "Is this even website"
  do
    if [[ ! $(grep $website $new_file) ]]; then
      echo $website >> $missing_list
    fi
  done < $old_file
...
  
done < $mapping_file
if mapping_file contains \r that new_file will also contain it (as the last char) and also in your grep you will look for a filename containing that \r. Obviously that does not exist, grep will fail.
 
Old 07-13-2022, 11:04 AM   #15
dalacor
Member
 
Registered: Feb 2019
Distribution: Slackware
Posts: 173

Original Poster
Rep: Reputation: Disabled
MichaelK - That would make sense that the issue with the end of line character is more to do with the read command. I have never experienced this problem before, but I don't think that I ever used the read command before that I can recall.

BoughtonP - Well perhaps perfect was an exaggeration. The point I was making was that I was very pleased to finally get the script doing what I was expecting it to do. I have only tested a small sample (which did work - dare I say it - perfectly). But I was waiting for more info on this end of line character issue before doing more testing on other domain files. Can't see what bug you mean. I have fixed the "," and make it ";" if that's what you mean.

Pan64 I will make sure that the mapping_file is converted to Linux end of character format. So it won't contain \r. I will do more testing on this because I have not experienced end of line issues before so this is new to me.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compare two files and determine the reason of differences vamosrafa360 Linux - Software 12 08-05-2018 08:23 AM
[SOLVED] How to compare a list of files in two directories: compare content and print size Batistuta_g_2000 Linux - Newbie 9 03-24-2013 07:05 AM
Compare file extension from two different txt file and find the differences. Neal000 Programming 6 08-28-2012 02:03 PM
Tool like diff to compare two files & make graphical output highlighting differences? kmkocot Linux - Newbie 1 08-02-2012 09:26 PM
[SOLVED] Trying to compare two files and output it into a third file. chutsu Programming 11 07-31-2009 06:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 10:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration