search two files for specific words remove the line from one file
Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
search two files for specific words remove the line from one file
Hello
I have two files: file1 and file2. File2 is large
Im trying to query file2, line by line, for specific words that may be in file1, line by line and if the word matches a line in file 2 that line get removed within file2.
I could use some help to start a script for it. bash? python?
Have you tried what was suggested by Turbocapitalist?
Code:
grep -Fvf <(awk '$0=$2' file1) file2
Thank you for your reply!
Yep I tried what was suggested earlier but the outcome was not what I wanted. It seemed to just copy what the file was, nothing more so I assume I didnt have the proper format. However the example you posted was not what I used so I will try that out too.
You can supply a file as the list of words t be matched/deleted etc to sed to compare against a 2nd file https://stackoverflow.com/questions/...another-file-a . Look for the text "grep -Fvxf <lines-to-remove> <all-lines>" on that page.
Yep I tried what was suggested earlier but the outcome was not what I wanted. It seemed to just copy what the file was, nothing more so I assume I didnt have the proper format.
Then you should describe the format of both files in more detail.
First, try getting the grep solution to work. It is not the fastest solution, but probably one of the easier to understand. You can optimize it further if needed. An awk solution is more flexible and probably faster as well, especially, if you explicitly specify mawk rather than gawk as awk interpreter (the former tends to be faster than the latter)
Code:
awk 'NR==FNR{_[$2];next}!($0 in _)' file1 file2
but adjusting it to your needs requires some understanding of how awk works.
A shell solution may be THE easiest to understand, but probably the slowest one as well
Code:
#!/bin/sh
while IFS= read -r line
do grep -qw "$line" file1 || printf %s\\n "$line"
done <file2
Again, the grep command, the printf command and even the read command may require some adjustments depending on what exactly are you trying to read, to match, and to output.
And as said, if both files are sorted, there are more efficient ways to do this. E.g.
Of course, this doesn't make sense if you have to sort both files on the fly as I did above. But if the files are already sorted (or even if only the large one is) then join may beat awk performance wise.
Who cares about performance ?.
I spent an entire career optimising system performance - I had a good (well paying) life. No-one cares anymore (yes, no one will employ me now).
nickel-and-dime'íng in a home environment is pointless - just find a solution you like and run with it.
Basically, I think I have to put the contents of file1 into memory while searching file2. If a word from file1 is found in file2 then remove the line is containing it.
I will definitely review what everyone has posted and try them out and see what I can do. Thanks!
Is there a way to only remove lines in file2 that start with the word in file1?
The fgrep name is just a shortcut for grep -F which was shown above in post #2. But that is for fixed strings not patterns. Now that you want to anchor the string, you have to make a pattern.
Code:
grep -f <(cat file1 | sed 's/^/^/' ) file2
How many patterns are in file1? If there are many then you may want a different approach.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.