Using grep to compare two files and make a filter on key words

Ikebukuro · 12-13-2019, 11:09 AM

Hello experts,

I have a problem with two files.
File 1 : one word by line; it is the name of Oracle's tables.
File 2 : many words by line; it is the SELECT * from Oracle's views.

I need to find, in the file 1, all the words that are not in the file 2.
I tried with grep but I failed...

Can you tell me how to do?

Here an extract of File 1 : there are many space characters at the end of the line, we have to remove them in the search.

Quote:

WRH$_ACTIVE_SESSION_HISTORY
WRH$_ACTIVE_SESSION_HISTORY_BL
WRH$_ASM_BAD_DISK
WRH$_ASM_DISKGROUP
WRH$_ASM_DISKGROUP_STAT
WRH$_BG_EVENT_SUMMARY

Here an extract I simplified and modified of File 2.

PHP Code:



SELECT * from WRH$_ACTIVE_SESSION_HISTORY                                                                                                                                                                              
SELECT * from WRH$_ASM_BAD_DISK                                                                                                                                                                                        
SELECT col12, col34 from WRH$_ASM_DISKGROUP, WRH$_ASM_BAD_DISK etc etc                                                                                                                                                                                       
SELECT * from WRH$_ASM_DISKGROUP_STAT ORDER BY 1

The result I want to see:

Quote:

WRH$_ACTIVE_SESSION_HISTORY_BL
WRH$_BG_EVENT_SUMMARY

Sorry, I don't see where I can insert a whole file (is it possible?) I don't want to copy ALL the content of my files, it is too big...
And sorry if I made mistakes for my first post here ...

Have a very nice day.

Turbocapitalist · 12-13-2019, 11:15 AM

You're probably going to need an awk or perl script with an associative array. Read the shorter word list into the array and then run the second, larger file through it.

Farcrada · 12-13-2019, 11:17 AM

You could write a small Python script/program that takes two files as arguments and spits out a difference file and a file with everything that matches. I feel like that would be the best short-term solution, assuming you need to do this for the mentioned big file(s).

Ikebukuro · 12-13-2019, 11:21 AM

Thank you for yours answers but I am an Oracle DBA, not an expert Linux... I don't know how to use Python or Perl or Awk :-(

Turbocapitalist · 12-13-2019, 11:34 AM

No time like the present. It's pretty much not possible to do system adminsitration without periodically needing some awk or perl or (maybe) python.

AWK is a full language and would take ages to master but getting an idea of the basics can take but a few minutes:

https://www.grymoire.com/Unix/Awk.html

AWK in its most elementary form is just a buch of abbreviated if-then statements:

Code:

awk '
NR==FNR {
        a[$1]++;
        next;
} 

a[$4] {
        delete a[$4];
} 

END {
        for (i in a) { 
                print i;
        }
}
' file1.txt file2.txt | sort

$1, $4, NR, and FNR are built-in variables. END {} is a clause which gets run once after there is no more input.

Ikebukuro · 12-13-2019, 02:04 PM

Thank you Turbocapitalist for Awk, I've forgotten that I have a book about it...
Tomorrow I will read it, I think it will be very useful to solve my problem.

Turbocapitalist · 12-13-2019, 10:28 PM

Having the pattern in an unpredicable place on each line and possibly multiple times means you'll probably have to work out something with the match() function or similar.

MadeInGermany · 12-14-2019, 03:07 PM

Unfortunately the following simple word grep returns the non-matching lines from file2

Code:

grep -vwf file1 file2

And the vice versa won't match as words.

boughtonp · 12-14-2019, 03:44 PM

Doing this with grep isn't difficult:

Code:

while read -r line; do
	grep -wFq "$line" queries.txt || echo "$line"
done < tables.txt

tables.txt is your file 1 - the list of tables (needles) to search for, with $line being each one.
queries.txt is your file 2 - the queries file (haystack) to search within.

The first and last lines are for looping through the file (for variations see BashFAQ/001).

Grep flags:
-w matches whole words (addresses suffixed names)
-F is for Fixed strings - i.e. disables regex matching
-q is for quiet - i.e. don't output when matches found (we want the opposite)

The || is so that when grep doesn't match, the line searched for is then output.

Ikebukuro · 12-15-2019, 06:36 AM

boughtonp, you are GREAT!
It works

Thank you everybody for your help, I was reading a book about Awk but I see we can manage my problem with grep!

Have a nice day :-)

frankbell · 12-15-2019, 07:47 PM

See man diff.