LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-17-2009, 04:26 AM   #1
fucinheira
LQ Newbie
 
Registered: Sep 2009
Posts: 3

Rep: Reputation: 0
sed delete lines from file one if regexp are listed in file two


Hello there,

I am trying to delete lines of a file if they contain text that is present on another file. For example

> cat one.txt:
a
b
c
d
e
f
g

> cat two.txt
c
d
e

If I run the following script
> cat test.sh
#!/bin/bash

while read LINE
do
sed -e "/$LINE/d" $1
done < $2

I get the following output:
> ./test.sh one.txt two.txt
a
b
d
e
f
g
a
b
c
e
f
g
a
b
c
d
f
g

instead of the "expected":
a
b
f
g

Obviously I am doing something wrong. I would appreciate any help.

Thanks, Jose
 
Old 09-17-2009, 04:50 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

The (none expected) output is correct.

The while read takes the first line from two.txt and removes it from one.txt and prints.
Result: a b d e f g (c is removed)

Then the while read take the second line from two.txt and removes it from one.txt and prints.
Result: a b c e f g (d is removed)

Same for the third line in two.txt.

One.txt is use three times (once for every line in two.txt).

You need to read all the entries in two.txt and give them to sed in one go.

Something like this (oneliner from command line) will do what you want:

sed -e '/c/d' -e '/d/d' -e '/e/d' one.txt

I'm not sure if you know how, but I'll let you play with this first.

Anyway, hope this helps.
 
Old 09-17-2009, 05:00 AM   #3
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Look at the comm command. The input files need to be sorted. You can output lines unique to the second file:

comm -13 <(sort file1) <(sort file2)

See "man comm" for full details on this command.
 
Old 09-17-2009, 06:22 AM   #4
fucinheira
LQ Newbie
 
Registered: Sep 2009
Posts: 3

Original Poster
Rep: Reputation: 0
sed delete lines from file one if regexp are listed in file two

Thanks for both answers, they provide useful hints. Actually the situation is more complicated. Sorry if my previous message was a bit misleading. Image that file one.txt contains several hundreds lines of text with several words in each line while file two.txt contains a list of a couple of hundred words. What I would like is to delete every line in one.txt that contains at least one word listed in file two.txt.

Thanks again, Jose
 
Old 09-17-2009, 06:40 AM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Do take a look at what jschiwal mentioned.

I do believe that the comm command can do what you want (and sorting the files is crucial!).
 
Old 09-17-2009, 07:01 AM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Also look at the options for grep. You can use a file for the source of the patterns. You can also use an option that returns lines not matching the patterns. These combined would have the effect of deleting lines in one file that don't contain words in a list.
 
Old 09-17-2009, 08:28 AM   #7
fucinheira
LQ Newbie
 
Registered: Sep 2009
Posts: 3

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jschiwal View Post
Also look at the options for grep. You can use a file for the source of the patterns. You can also use an option that returns lines not matching the patterns. These combined would have the effect of deleting lines in one file that don't contain words in a list.
Thanks again to both! It works, provide it that I remove empty lines first.

> sed '/^$/d' one.txt
> sed '/^$/d' two.txt
> cat one.txt | grep -v -f two.txt > output.txt
> cat one.txt
a
b
c
d
e
f
g
> cat two.txt
c
d
e
> cat output.txt
a
b
f
g

Great! Just what I need!

Jose
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script to delete folder's that are listed in a text file Bone11409 Programming 26 01-16-2009 02:55 PM
[sed || gawk]: find and delete blocks and lines from file Hisu Programming 1 09-16-2008 02:01 PM
Insert and delete lines at the end of a file using sed DriveMeCrazy Programming 1 01-05-2007 01:45 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM


All times are GMT -5. The time now is 07:04 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration