Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
05-20-2006, 03:58 AM
|
#1
|
LQ Newbie
Registered: Jan 2006
Posts: 16
Rep:
|
sed or grep : delete lines containing matching text
Hi,
I have been struggling with this for really long.
I want to match text patterns from a file and delete all lines in a 2nd file that contain the matching pattern from the first file.
I have 2 files , namely "emails" and "delemails" . "emails" contains a list of 2000 emails and "delemails" contains a list of 200 emails that need to be deleted from "emails".
I know i can do it using sed or grep -v option...but can't get the syntax right.
Would appreciate your help.
|
|
|
05-20-2006, 06:07 AM
|
#2
|
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809
|
grep simply finds lines containing a particular pattern. grep -v lloks for line that do NOT contain the pattern. Not sure how that relates to your problem....
sed is a relatively complex function. At its core is is used to find and change patterns of any size, but it has a whole bunch of options. Here is a very good tutorial:
http://www.grymoire.com/Unix/Sed.html#uh-8
You may also want to look at awk.
When dealing with two different files, it may be easier to have a script that opens one file, finds the pattern of interest and then feeds that to the function that is going to operate on the other file.
If you post some of your actual code, we may be able to be more helpful on the approach you are trying.
|
|
|
05-20-2006, 07:54 AM
|
#3
|
Member
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Pop!_OS && Windows 10 && Arch Linux
Posts: 832
|
grep -Ev 'crap0|crap1|crap2|crap3'
That will delete lines containing crap[0-3].
Last edited by //////; 05-20-2006 at 07:55 AM.
|
|
|
05-20-2006, 11:27 AM
|
#4
|
Member
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424
Rep:
|
This is just a nonsense post because the forum won't let me post URLs until I posted at least 3 messages. So see below. 
|
|
|
05-20-2006, 11:29 AM
|
#5
|
Member
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424
Rep:
|
Are you saying that your files are mbox files containing entire messages? Then simple pattern matching on separate lines won't be very helpful. To me it seems more reasonable to write a Perl script using modules from http://www.cpan.org/modules that are designed to handle email properly (have a look at Email::Folder and Email::LocalDelivery).
|
|
|
05-20-2006, 06:53 PM
|
#6
|
Member
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Pop!_OS && Windows 10 && Arch Linux
Posts: 832
|
#!/bin/bash
# makedict.sh [make dictionary]
# Modification of /usr/sbin/mkdict script.
#
# Original script copyright 1993, by Alec Muffett.
#*************************************************************************************#
# This modified script included in this document in a manner consistent with the
# "LICENSE" document of the "Crack" package that the original script is a part of.
# This script processes text files to produce a sorted list of words found in the files.
# This may be useful for compiling dictionaries and for lexicographic research.
#*************************************************************************************#
#Usage: /root/Desktop/makedict.sh files-to-process
#
# or in this case :
#cat /root/Desktop/delemails /root/Desktop/emails > /root/Desktop/cattedlist
#
# NOTICE look at /root/Desktop/cattedlist , there is one line that could have
# 2 email addresses combined like this : Someone@crap.comSomeone@crap2.com
#Then
#/root/Desktop/makedict.sh /root/Desktop/cattedlist > /root/Desktop/cleanlist.txt
E_BADARGS=65
if [ ! -r "$1" ] # Need at least one
then # valid file argument.
echo "Usage: $0 files-to-process"
exit $E_BADARGS
fi
cat $* | # Contents of specified files to stdout.
# tr A-Z a-z | # Convert to lowercase.
tr ' ' '\012' | # Change spaces to newlines.
# tr -c '\012a-z' '\012' |
sort |
# uniq | # Remove duplicates.
uniq -u|
grep -v '^#' | # Delete lines beginning with a hashmark.
grep -v '^$' # Delete blank lines.
exit 0
|
|
|
05-22-2006, 12:57 AM
|
#8
|
LQ Newbie
Registered: Jan 2006
Posts: 16
Original Poster
Rep:
|
Actually, grep -Ev 'crap0|crap1|crap2|crap3' is very close to what i want, but i want to input the regular expressions from a file. Ouput will be fairly simple like >>file3
|
|
|
05-22-2006, 01:30 AM
|
#9
|
Member
Registered: Nov 2005
Location: Land of Linux :: Finland
Distribution: Pop!_OS && Windows 10 && Arch Linux
Posts: 832
|
Quote:
Originally Posted by raj000
|
If this is what you want to do, you can do it with that script.
What that script does is that it cat's them is one big file [emails+delemails], then it looks for duplicates [uniq -u] and outputs only (it deletes duplicates) unique email addresses, and if you have catted both emails and delemails you end up with email list that doesnt anymore contain those emails that you wanted to delete.
So, first cat'em:
Code:
cat /path/to/delemails /path/to/emails > /path/to/DelemailsAndMails.txt
Then:
Code:
/path/to/makedict.sh /path/to/DelemailsAndMails.txt > /path/to/cleanlist.txt
And that "cleanlist.txt" will contain only those emails you wanted. Just save that script as "makedict.sh" and modify the paths and it will work, I tested it ;-)
Last edited by //////; 05-22-2006 at 01:32 AM.
|
|
|
05-26-2006, 04:56 AM
|
#10
|
LQ Newbie
Registered: Jan 2006
Posts: 16
Original Poster
Rep:
|
hmmm...very promising solution . Its very close to what i wanted , except for one thing.
The file "delemails" may contain some emails that do not exist in the file "emails". I want these to be ignored. In the solution you have mentioned, mails that are in "delemails" but not in "emails" will not be duplicate. So when i get the output of "cleanlist", it will contain emails that existed in "delemails", but did not exist in "emails"
Is there not a sed command that takes the input from a file to delete emails?
I could use excel to append a "-d" (or whatever) to the beginning of each line in delemails so it looks like -:
delemails:
-d email1@email.com
-d email2@email2.com
Thanks
|
|
|
05-27-2006, 05:05 AM
|
#11
|
Member
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424
Rep:
|
As a solution in that line of thought, you could first pipe both files to "(sort | uniq -d)", which will return only those lines that appear twice. This may then be subtracted from the first file as before.
But you'll have to make sure that neither of the two input files contains duplicate lines (possibly by using "uniq -u" on each file separately).
|
|
|
09-25-2008, 03:02 AM
|
#12
|
LQ Newbie
Registered: Sep 2008
Posts: 3
Rep:
|
Solution
Too late a reply, but if someone has a similar problem, here's a neat one-liner to solve this in bash:
Code:
diff file1 file2 | sed '/^[0-9][0-9]*/d; s/^. //; /^---$/d' > file3
This gives all the lines which are not common to the two files.
Sorted mail-lists have a simpler solution:
Code:
comm -23 file1 file2 > file3
This gives all the lines in file1 that are not present in file2.
Last edited by nitin_nitt; 09-25-2008 at 06:36 AM.
|
|
|
08-02-2009, 09:44 PM
|
#13
|
LQ Newbie
Registered: Aug 2009
Location: BHOPAL
Posts: 6
Rep:
|
search for string and delet files
grep -lir 'string to search in files' * |xargs rm -rf
Options
l - for listing file name only
i - ignore case while searching
r - search recursively within sub directories also
xargs - the list of files from grep command are passed as parameter to "rm -rf" command
Last edited by asnani_satish; 08-02-2009 at 09:46 PM.
|
|
|
12-06-2009, 01:00 AM
|
#14
|
LQ Newbie
Registered: May 2008
Posts: 1
Rep:
|
Quote:
Originally Posted by //////
grep -Ev 'crap0|crap1|crap2|crap3'
That will delete lines containing crap[0-3].
|
beautiful. You saved me 406 lines of deletion!
I'm going to remember this.
Thanks,
Doug
|
|
|
01-06-2010, 10:35 AM
|
#15
|
LQ Newbie
Registered: Jan 2010
Posts: 1
Rep:
|
Same problem
Hi i got the same problem here i want to delete each lines of codes that has H3qqea3ur6p in it which is a virus thats infecting people browsing a site that im hosting.. anyhow the grep -Ev doesn't seem to work for me any suggestions???
i had done 1400 lines of removing code so far lol.. but 700 are still not done.
Yannick.
|
|
|
All times are GMT -5. The time now is 04:45 AM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|