LinuxQuestions.org - sed or grep : delete lines containing matching text

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - sed or grep : delete lines containing matching text (https://www.linuxquestions.org/questions/linux-general-1/sed-or-grep-delete-lines-containing-matching-text-446640/)

raj000

05-20-2006 03:58 AM

sed or grep : delete lines containing matching text

Hi,

I have been struggling with this for really long.

I want to match text patterns from a file and delete all lines in a 2nd file that contain the matching pattern from the first file.

I have 2 files , namely "emails" and "delemails" . "emails" contains a list of 2000 emails and "delemails" contains a list of 200 emails that need to be deleted from "emails".

I know i can do it using sed or grep -v option...but can't get the syntax right.

Would appreciate your help.

pixellany

05-20-2006 06:07 AM

grep simply finds lines containing a particular pattern. grep -v lloks for line that do NOT contain the pattern. Not sure how that relates to your problem....

sed is a relatively complex function. At its core is is used to find and change patterns of any size, but it has a whole bunch of options. Here is a very good tutorial:
http://www.grymoire.com/Unix/Sed.html#uh-8

You may also want to look at awk.

When dealing with two different files, it may be easier to have a script that opens one file, finds the pattern of interest and then feeds that to the function that is going to operate on the other file.

If you post some of your actual code, we may be able to be more helpful on the approach you are trying.

//////

05-20-2006 07:54 AM

grep -Ev 'crap0|crap1|crap2|crap3'

That will delete lines containing crap[0-3].

spirit receiver

05-20-2006 11:27 AM

This is just a nonsense post because the forum won't let me post URLs until I posted at least 3 messages. So see below.:D

spirit receiver

05-20-2006 11:29 AM

Are you saying that your files are mbox files containing entire messages? Then simple pattern matching on separate lines won't be very helpful. To me it seems more reasonable to write a Perl script using modules from http://www.cpan.org/modules that are designed to handle email properly (have a look at Email::Folder and Email::LocalDelivery).

//////

05-20-2006 06:53 PM

#!/bin/bash
# makedict.sh [make dictionary]
# Modification of /usr/sbin/mkdict script.
#
# Original script copyright 1993, by Alec Muffett.
#*************************************************************************************#
# This modified script included in this document in a manner consistent with the
# "LICENSE" document of the "Crack" package that the original script is a part of.
# This script processes text files to produce a sorted list of words found in the files.
# This may be useful for compiling dictionaries and for lexicographic research.
#*************************************************************************************#
#Usage: /root/Desktop/makedict.sh files-to-process
#
# or in this case :
#cat /root/Desktop/delemails /root/Desktop/emails > /root/Desktop/cattedlist
#
# NOTICE look at /root/Desktop/cattedlist , there is one line that could have
# 2 email addresses combined like this : Someone@crap.comSomeone@crap2.com
#Then
#/root/Desktop/makedict.sh /root/Desktop/cattedlist > /root/Desktop/cleanlist.txt

E_BADARGS=65

if [ ! -r "$1" ] # Need at least one
then # valid file argument.
echo "Usage: $0 files-to-process"
exit $E_BADARGS
fi

cat $* | # Contents of specified files to stdout.
# tr A-Z a-z | # Convert to lowercase.
tr ' ' '\012' | # Change spaces to newlines.
# tr -c '\012a-z' '\012' |
sort |
# uniq | # Remove duplicates.
uniq -u|
grep -v '^#' | # Delete lines beginning with a hashmark.
grep -v '^$' # Delete blank lines.

exit 0

raj000

05-21-2006 05:55 AM

Hi,

Thanks a ton for your replies. Actually the 2 files are fairly simple text files. Each file contains email addresses in separate row.

File1 -: (this contains about 2000 mails)

email1@email.com
email2@email2.com
email3@email3.com
email4@email4.com

File2 -: (this contains about 200 mails that should be deleted from file1)

email1@email.com
email2@email2.com

in the above example, i need a sed command that would take inputs from file2, 1 line at a time, delete email1@email.com and email2@email2.com from file1 and output the result to file3.

raj000

05-22-2006 12:57 AM

Actually, grep -Ev 'crap0|crap1|crap2|crap3' is very close to what i want, but i want to input the regular expressions from a file. Ouput will be fairly simple like >>file3

//////

05-22-2006 01:30 AM

Quote:

Originally Posted by raj000

emails:
email1@email.com
email2@email2.com
email3@email3.com
email4@email4.com

delemails:
email1@email.com
email2@email2.com

cleanlist:
email3@email3.com
email4@email4.com

If this is what you want to do, you can do it with that script.
What that script does is that it cat's them is one big file [emails+delemails], then it looks for duplicates [uniq -u] and outputs only (it deletes duplicates) unique email addresses, and if you have catted both emails and delemails you end up with email list that doesnt anymore contain those emails that you wanted to delete.

So, first cat'em:

Code:

cat /path/to/delemails /path/to/emails > /path/to/DelemailsAndMails.txt

Then:

Code:

/path/to/makedict.sh /path/to/DelemailsAndMails.txt > /path/to/cleanlist.txt

And that "cleanlist.txt" will contain only those emails you wanted. Just save that script as "makedict.sh" and modify the paths and it will work, I tested it ;-)

raj000

05-26-2006 04:56 AM

hmmm...very promising solution . Its very close to what i wanted , except for one thing.

The file "delemails" may contain some emails that do not exist in the file "emails". I want these to be ignored. In the solution you have mentioned, mails that are in "delemails" but not in "emails" will not be duplicate. So when i get the output of "cleanlist", it will contain emails that existed in "delemails", but did not exist in "emails"

Is there not a sed command that takes the input from a file to delete emails?

I could use excel to append a "-d" (or whatever) to the beginning of each line in delemails so it looks like -:

delemails:
-d email1@email.com
-d email2@email2.com

Thanks

spirit receiver

05-27-2006 05:05 AM

As a solution in that line of thought, you could first pipe both files to "(sort | uniq -d)", which will return only those lines that appear twice. This may then be subtracted from the first file as before.
But you'll have to make sure that neither of the two input files contains duplicate lines (possibly by using "uniq -u" on each file separately).

nitin_nitt

09-25-2008 03:02 AM

Solution

Too late a reply, but if someone has a similar problem, here's a neat one-liner to solve this in bash:

Code:

diff file1 file2 | sed '/^[0-9][0-9]*/d; s/^. //; /^---$/d' > file3

This gives all the lines which are not common to the two files.

Sorted mail-lists have a simpler solution:

Code:

comm -23 file1 file2 > file3

This gives all the lines in file1 that are not present in file2.

asnani_satish

08-02-2009 09:44 PM

search for string and delet files

grep -lir 'string to search in files' * |xargs rm -rf

Options
l - for listing file name only
i - ignore case while searching
r - search recursively within sub directories also
xargs - the list of files from grep command are passed as parameter to "rm -rf" command

dougsk

12-06-2009 01:00 AM

Quote:

Originally Posted by ////// (Post 2253980)

grep -Ev 'crap0|crap1|crap2|crap3'

That will delete lines containing crap[0-3].

beautiful. You saved me 406 lines of deletion!

I'm going to remember this.

Thanks,

Doug

cezam

01-06-2010 10:35 AM

Same problem

Hi i got the same problem here i want to delete each lines of codes that has H3qqea3ur6p in it which is a virus thats infecting people browsing a site that im hosting.. anyhow the grep -Ev doesn't seem to work for me any suggestions???

i had done 1400 lines of removing code so far lol.. but 700 are still not done.

Yannick.

devnull10

01-06-2010 12:58 PM

If you just want to delete lines in a file containing a particular string, use the following:

Code:

sed '/H3qqea3ur6p/d' < oldfile > newfile

I don't know if you want to check for spaces around the word, start and end of lines etc - if you do you will need to modify the above as all that does is look for the existence of the string H3qqea3ur6p and deletes any lines containing it.

jschiwal

01-06-2010 01:08 PM

Quote:

Originally Posted by raj000 (Post 2255228)

This may be a better description of what you want to do, if you mean "email addresses" when you say mails and you provided a represented sample of lines from each file.

Since you are using the entire line, you want to be left with a file that contains items in file1 that are unique.

Code:

comm -23 <(sort file1) <(sort file2) >file3

The "comm" program prints 3 columns. Files unique to file1, files unique to file 2 and files common to both. The -23 option suppresses the printing of the second and third column.

The "comm" program is one of the programs supplied by the coreutils package so you should have it.
--
Another way of doing this is using grep with the -f option combined with the -v option. This would remove lines from one file that contains patterns in a second file.

Code:

grep -vf <(sort file2 | uniq) file1 >file3

Sorting file2 isn't necessary here but if there is a lot of repetition `uniq' can eliminate the duplicates to save time.

survietamine

01-18-2010 08:22 AM

Quote:

Originally Posted by cezam (Post 3816366)

I cannot see any reason that you cannot delete with grep or sed.
Your pattern H3qqea3ur6p is a simple one.
If you don't need extended regexp or don't know how to use it, you can omit -E switch from grep.

Show us your commands.

faizit1988

09-08-2012 09:38 AM

use grep -v -f [file_delemail] [file_email] > [file_final]

All times are GMT -5. The time now is 12:16 PM.