[SOLVED] Problem using grep -f

zoeplankton · 11-26-2010, 11:31 AM

Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!

crts · 11-26-2010, 11:57 AM

Quote:

Originally Posted by zoeplankton

Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!

Hi,

are you sure that every string is on its own line? Also, I see a windows logo on the left side. Are you trying to edit windows files in linux? If so, then try to convert the files you are using to unix format first:
dos2unix file.txt

markush · 11-26-2010, 03:20 PM

Hi zoeplankton and welcome to LQ,

you may search for lines with Upercase letters:

Code:

grep -e [A-Z] file

in your case you are looking for the lines which do not have such characters

Code:

grep -v -e [A-Z] file

and creating a new file without the matching lines

Code:

grep -v -e [A-Z] file > newfile

if this doesn't meet your requirements I'd recommend to use sed, the streameditor, please look at the manpage

Code:

man sed

Markus

Tinkster · 11-26-2010, 03:42 PM

Quote:

Originally Posted by zoeplankton

Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!

The "problem" here is that you're saying to don't want to
see any lines that have ALL words from your file in one line.

If your solaris has has a decent egrep you could try

Code:

egrep -v "$(sed 's/\n/|/g' filecontainingSTRINGS)" targetfile.txt > outputfile.txt

Untested.

Cheers,
Tink

zoeplankton · 11-26-2010, 05:59 PM

Quote:

Originally Posted by crts

Hi,

are you sure that every string is on its own line? Also, I see a windows logo on the left side. Are you trying to edit windows files in linux? If so, then try to convert the files you are using to unix format first:
dos2unix file.txt

Aha! Yes - my text editor is textpad (very good regular expressions engine) while only runs on pc - and yes, I'm on a (work) pc. I have the option to save as unix in textpad, I'll try that Monday morning. Thanks!

**************************

Update - yup, that was the problem. Fixed now, thanks!

zoeplankton · 11-26-2010, 06:03 PM

Quote:

Originally Posted by Tinkster

The "problem" here is that you're saying to don't want to
see any lines that have ALL words from your file in one line.

If your solaris has has a decent egrep you could try

Code:

egrep -v "$(sed 's/\n/|/g' filecontainingSTRINGS)" targetfile.txt > outputfile.txt

Untested.

Cheers,
Tink

Yup, I've had egrep recommended by a couple of other folks - am looking into it now & will try implementing on Monday when back at work. Thanks!