search two files for specific words remove the line from one file

CyberIT · 11-19-2021, 09:49 PM

Quote:

Originally Posted by shruggy

I supposed the OP really meant what they posted.

Looks like I was wrong though.

After doing some playing around... I came up with the following which is when fgrep did help me out.

Code:

cat file1 | awk '{print $2}' | awk -F\. '{print $1}' > $records
cat file2 | awk '{print $3}' > $records.1
cat $records > $records.2
cat $records.1 >> $records.2

CyberIT · 11-19-2021, 09:55 PM

Quote:

Originally Posted by Turbocapitalist

The fgrep name is just a shortcut for grep -F which was shown above in post #2. But that is for fixed strings not patterns. Now that you want to anchor the string, you have to make a pattern.

My apologies. I honestly didnt know that fgrep was the same as grep -F. Im still new at all of this as I dont constantly work with it everyday.

I didnt quite understand this portion of it in Post#2 <(awk '$0=$2' file1) though.

Within file1 there is just one word per line. Im looking to have that word start the line within file2 and if it does, remove that line only.
I would have to say there are maybe 50 lines...

Thank you!!

Turbocapitalist · 11-19-2021, 10:42 PM

No worries. No matter which direction one chooses to look there are far more details than it is possible to absorb. You eventually learn to kind of swim in it and navigate between parts that have become familiar through use.

The <( ... ) annotation is a process substitiution where the output of the enclosed clause is treated as an input (or output) file itself. See also the Advancced Bash-scripting Guide's section on process substitution.

It is a mainly Bashism but also found in Zsh and maybe a few other advanced shells.

Try the example in #15 above. It turns each line of file1 into a pattern using process substitution.

CyberIT · 11-20-2021, 12:17 AM

Quote:

Originally Posted by Turbocapitalist

No worries. No matter which direction one chooses to look there are far more details than it is possible to absorb. You eventually learn to kind of swim in it and navigate between parts that have become familiar through use.

The <( ... ) annotation is a process substitiution where the output of the enclosed clause is treated as an input (or output) file itself. See also the Advancced Bash-scripting Guide's section on process substitution.

It is a mainly Bashism but also found in Zsh and maybe a few other advanced shells.

Try the example in #15 above. It turns each line of file1 into a pattern using process substitution.

Quote:

Try the example in #15 above. It turns each line of file1 into a pattern using process substitution.

Seems to be working but not sure... it is taking an awful long time to complete.

MadeInGermany · 11-20-2021, 08:08 AM

In many cases it's sufficient to require boundaries at both ends.
grep has the -w option (word match)

Code:

grep -vwf file1 file2

Your examples differ a bit from your written requirements.?

A comment to
awk -F\.
I think it's the same as
awk -F.
-F sets FS (FieldSeparator). GNU awk treats a single character as a character set.
For clarity and portability, you can explicitly give the character set as
awk -F"[.]"

CyberIT · 11-20-2021, 08:42 AM

Quote:

Originally Posted by MadeInGermany

In many cases it's sufficient to require boundaries at both ends.
grep has the -w option (word match)

Code:

grep -vwf file1 file2

word match would indicate it will look for the such words in any line and remove those lines, correct? Im looking for the such word at the beginning of the line and remove that line only. Thank you!

What is the difference between fgrep -vf file1 file2 and grep -vwf file1 file2

Turbocapitalist · 11-20-2021, 09:06 AM

Out of curiosity what is the output from time when used in the following way?

Code:

time grep -f <(cat file1 | sed 's/^/^/' ) file2

time grep -w -f <(cat file1 | sed 's/^/^/' ) file2

Does the second line finish sooner?

CyberIT · 11-20-2021, 09:42 AM

Quote:

Originally Posted by CyberIT

Seems to be working but not sure... it is taking an awful long time to complete.

Oh geeze... I found why it was taking forever. It was an error on my part. My apologies.

It was looking for the wrong variable for file2. Man I feel dumb.

CyberIT · 11-20-2021, 09:47 AM

Quote:

Originally Posted by Turbocapitalist

Out of curiosity what is the output from time when used in the following way?

Code:

time grep -f <(cat file1 | sed 's/^/^/' ) file2

time grep -w -f <(cat file1 | sed 's/^/^/' ) file2

Does the second line finish sooner?

First line
real 0m0.306s
user 0m0.276s
sys 0m0.040s

Second line
real 0m1.111s
user 0m1.072s
sys 0m0.042s

CyberIT · 11-20-2021, 09:53 AM

Quote:

Originally Posted by CyberIT

First line
real 0m0.306s
user 0m0.276s
sys 0m0.040s

Second line
real 0m1.111s
user 0m1.072s
sys 0m0.042s

However the output doesnt look proper with either of those ways as Im missing a lot of other lines within file2. The output looks proper doing it this way though and the time is in-between both of them but closer to the second one.

Code:

grep -vwf file1 file2

real 0m0.082s
user 0m0.061s
sys 0m0.024s