LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 11-18-2021, 01:14 PM   #1
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Rep: Reputation: Disabled
Question search two files for specific words remove the line from one file


Hello

I have two files: file1 and file2. File2 is large

Im trying to query file2, line by line, for specific words that may be in file1, line by line and if the word matches a line in file 2 that line get removed within file2.

I could use some help to start a script for it. bash? python?

Thank you much!

Last edited by CyberIT; 11-18-2021 at 01:31 PM.
 
Old 11-18-2021, 01:37 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Maybe grep with the -v, -f, and -F options? The output can be saved into another file using a redirection.
 
Old 11-18-2021, 02:52 PM   #3
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
If files are sorted, join may be an interesting alternative. See 8.3.6 Union, Intersection and Difference of files in the GNU Coreutils Manual, particularly, the difference.

For unsorted files, combine from package moreutils is also an option.
 
Old 11-18-2021, 03:35 PM   #4
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Original Poster
Rep: Reputation: Disabled
This is above my head but Id like to figure it out... What Im trying to do is the following

example
LINE 1:
Code:
cat file1 | awk '{print $2}'    -gives an output of blah.example.com
LINE 2:
Code:
cat file1 | awk '{print $2}'    -gives an output of help.example.com
With that info I want to find {print $2} of each line within file1 then look for same output within file2 and remove the entire line within file2.

Im not sure if I can use bash to do that or would python be better? Any help or examples would be great?? Thank you much!

Last edited by CyberIT; 11-18-2021 at 03:42 PM.
 
Old 11-18-2021, 03:50 PM   #5
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Have you tried what was suggested by Turbocapitalist?
Code:
grep -Fvf <(awk '$0=$2' file1) file2
 
Old 11-18-2021, 06:06 PM   #6
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Turbocapitalist View Post
Maybe grep with the -v, -f, and -F options? The output can be saved into another file using a redirection.
Thank you for your response.

I tried this but the outcome wasnt what I expected. I dont think I used it properly.
 
Old 11-18-2021, 06:07 PM   #7
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by shruggy View Post
Have you tried what was suggested by Turbocapitalist?
Code:
grep -Fvf <(awk '$0=$2' file1) file2
Thank you for your reply!

Yep I tried what was suggested earlier but the outcome was not what I wanted. It seemed to just copy what the file was, nothing more so I assume I didnt have the proper format. However the example you posted was not what I used so I will try that out too.
 
Old 11-18-2021, 11:31 PM   #8
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
You can supply a file as the list of words t be matched/deleted etc to sed to compare against a 2nd file https://stackoverflow.com/questions/...another-file-a . Look for the text "grep -Fvxf <lines-to-remove> <all-lines>" on that page.
 
Old 11-19-2021, 05:24 AM   #9
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Quote:
Originally Posted by CyberIT View Post
Yep I tried what was suggested earlier but the outcome was not what I wanted. It seemed to just copy what the file was, nothing more so I assume I didnt have the proper format.
Then you should describe the format of both files in more detail.

First, try getting the grep solution to work. It is not the fastest solution, but probably one of the easier to understand. You can optimize it further if needed. An awk solution is more flexible and probably faster as well, especially, if you explicitly specify mawk rather than gawk as awk interpreter (the former tends to be faster than the latter)
Code:
awk 'NR==FNR{_[$2];next}!($0 in _)' file1 file2
but adjusting it to your needs requires some understanding of how awk works.

A shell solution may be THE easiest to understand, but probably the slowest one as well
Code:
#!/bin/sh
while IFS= read -r line
do grep -qw "$line" file1 || printf %s\\n "$line"
done <file2
Again, the grep command, the printf command and even the read command may require some adjustments depending on what exactly are you trying to read, to match, and to output.

And as said, if both files are sorted, there are more efficient ways to do this. E.g.
Code:
join -12 -21 -v2 <(sort -ubk2,2 file1) <(sort file2)
Of course, this doesn't make sense if you have to sort both files on the fly as I did above. But if the files are already sorted (or even if only the large one is) then join may beat awk performance wise.
 
Old 11-19-2021, 06:07 AM   #10
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Who cares about performance ?.
I spent an entire career optimising system performance - I had a good (well paying) life. No-one cares anymore (yes, no one will employ me now).
nickel-and-dime'íng in a home environment is pointless - just find a solution you like and run with it.
 
Old 11-19-2021, 12:04 PM   #11
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,790

Rep: Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201Reputation: 1201
I haven't seen a requirement for having words in column 2 in file1?
The following expects 1 word per line.
Code:
fgrep -vf file1 file2
 
Old 11-19-2021, 03:31 PM   #12
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Original Poster
Rep: Reputation: Disabled
WOW! Thank you all for your comments.

Basically, I think I have to put the contents of file1 into memory while searching file2. If a word from file1 is found in file2 then remove the line is containing it.

I will definitely review what everyone has posted and try them out and see what I can do. Thanks!
 
Old 11-19-2021, 04:11 PM   #13
CyberIT
Member
 
Registered: Jun 2017
Posts: 56

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
I haven't seen a requirement for having words in column 2 in file1?
The following expects 1 word per line.
Code:
fgrep -vf file1 file2

frgep seems to have done the trick. Thank you!

However I noticed that it actually removed more lines than what I needed.

Is there a way to only remove lines in file2 that start with the word in file1?
 
Old 11-19-2021, 05:27 PM   #14
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,670

Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
I haven't seen a requirement for having words in column 2 in file1?
I supposed the OP really meant what they posted.
Quote:
Originally Posted by CyberIT View Post
With that info I want to find {print $2} of each line within file1 then look for same output within file2 and remove the entire line within file2.
Looks like I was wrong though.
Quote:
Originally Posted by CyberIT View Post
frgep seems to have done the trick. Thank you!
 
Old 11-19-2021, 09:22 PM   #15
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by CyberIT View Post
Is there a way to only remove lines in file2 that start with the word in file1?
The fgrep name is just a shortcut for grep -F which was shown above in post #2. But that is for fixed strings not patterns. Now that you want to anchor the string, you have to make a pattern.

Code:
grep -f <(cat file1 | sed 's/^/^/' ) file2
How many patterns are in file1? If there are many then you may want a different approach.
 
  


Reply

Tags
bash, compare, file, python



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Words, Words, Words--Introducing OpenSearchServer LXer Syndicated Linux News 0 08-07-2019 02:13 PM
[SOLVED] Remove duplicated words from two big wordlist txt files ASTRAPI Linux - Newbie 28 11-26-2012 08:11 PM
[SOLVED] get two/more specific words on a line and print next few lines Kashif_Bash Programming 11 04-26-2012 12:15 AM
copy files containing specific words in a specified line abenmao Linux - Newbie 5 08-28-2008 09:04 AM
Search and Replace: Asian Words to English Words ieeestd802 Linux - Software 0 10-27-2004 07:48 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 08:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration