LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Problem using grep -f (https://www.linuxquestions.org/questions/linux-newbie-8/problem-using-grep-f-846725/)

zoeplankton 11-26-2010 11:31 AM

Problem using grep -f
 
Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!

crts 11-26-2010 11:57 AM

Quote:

Originally Posted by zoeplankton (Post 4171927)
Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!

Hi,

are you sure that every string is on its own line? Also, I see a windows logo on the left side. Are you trying to edit windows files in linux? If so, then try to convert the files you are using to unix format first:
dos2unix file.txt

markush 11-26-2010 03:20 PM

Hi zoeplankton and welcome to LQ,

you may search for lines with Upercase letters:
Code:

grep -e [A-Z] file
in your case you are looking for the lines which do not have such characters
Code:

grep -v -e [A-Z] file
and creating a new file without the matching lines
Code:

grep -v -e [A-Z] file > newfile
if this doesn't meet your requirements I'd recommend to use sed, the streameditor, please look at the manpage
Code:

man sed
Markus

Tinkster 11-26-2010 03:42 PM

Quote:

Originally Posted by zoeplankton (Post 4171927)
Hi all,

I'm trying to manipulate a large text file full of records (metadata - one complete record per line). I need to delete every line on which certain words appear - there are five different words, all pretty simple all-caps strings with occasional whitespace (like ADVISORY and EDITOR'S NOTE)

I tried using grep -v, which worked a treat, but only string-by-string. Ideally I'd like to run this as grep -v -f, where the file targeted by the -f contains the strings I need to match in order to delete the lines they're in.

i.e grep -v -f filecontainingSTRINGS.txt targetfile.txt > outputfile.txt

When I try this, however, I don't get any matches - or more specifically, no changes are made in the output file.

It works fine if there's only one string in filecontainingSTRINGS, but it doesn't work if there's more than one (I'm using newline as the delimiter).

(Also my machine doesn't recognise /usr/xpg4/bin/grep - no idea what that's all about!)

Any ideas? All help much appreciated!


The "problem" here is that you're saying to don't want to
see any lines that have ALL words from your file in one line.


If your solaris has has a decent egrep you could try
Code:

egrep -v "$(sed 's/\n/|/g' filecontainingSTRINGS)" targetfile.txt > outputfile.txt
Untested.



Cheers,
Tink

zoeplankton 11-26-2010 05:59 PM

Quote:

Originally Posted by crts (Post 4171941)
Hi,

are you sure that every string is on its own line? Also, I see a windows logo on the left side. Are you trying to edit windows files in linux? If so, then try to convert the files you are using to unix format first:
dos2unix file.txt

Aha! Yes - my text editor is textpad (very good regular expressions engine) while only runs on pc - and yes, I'm on a (work) pc. I have the option to save as unix in textpad, I'll try that Monday morning. Thanks!

**************************

Update - yup, that was the problem. Fixed now, thanks!

zoeplankton 11-26-2010 06:03 PM

Quote:

Originally Posted by Tinkster (Post 4172102)
The "problem" here is that you're saying to don't want to
see any lines that have ALL words from your file in one line.


If your solaris has has a decent egrep you could try
Code:

egrep -v "$(sed 's/\n/|/g' filecontainingSTRINGS)" targetfile.txt > outputfile.txt
Untested.



Cheers,
Tink

Yup, I've had egrep recommended by a couple of other folks - am looking into it now & will try implementing on Monday when back at work. Thanks!


All times are GMT -5. The time now is 10:52 AM.