Register a domain and help support LQ
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 12-08-2011, 10:20 AM   #1
Senior Member
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,342

Rep: Reputation: 367Reputation: 367Reputation: 367Reputation: 367
Linux command(s) to eliminate redundant words in a line

The input file is text of this form:
meat beef
flavor vanilla flavor chocolate
color blue color brown color green color red color yellow
music classical music jazz
wrench socket
vegetable potato vegetable broccoli vegetable carrot
automobile mercedes benz automobile toyota automobile rolls royce

The objective is to eliminate the redundant words (if any).
The key word is always the first word in the record.
However the redundant words are not always in positions 3, 5, 7, etc.

Desired output file:
meat beef
flavor vanilla chocolate
color blue brown green red yellow
music classical jazz
wrench socket
vegetable potato broccoli carrot
automobile mercedes benz toyota rolls royce

Intuition points to sed but the syntax baffles me.

Please advise.
Old 12-08-2011, 10:40 AM   #2
Senior Member
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Something like
while read -r line; do
    echo $line | sed 's/$(echo $line | grep '[a-zA-Z]* ' -o)//g'
done < text_file
Completely untested, I'm afraid, I'm at work on Windows I can check it later this evening.

Last edited by Snark1994; 12-08-2011 at 10:41 AM.
Old 12-08-2011, 10:48 AM   #3
Senior Member
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
A Perl onliner:
perl -pae '{%s=();$_=join " ",(grep{!$s{$_}++}@F)."\n"}' input_file
Surelly there is simpler way
Old 12-08-2011, 11:01 AM   #4
Nominal Animal
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
Cedrik's Perl one-liner is certainly more compact, but I think an awk oneliner would be easier to grok:
awk '{ printf("%s", $1) ; for (i = 2; i <= NF; i++) if ($i != $1) printf(" %s", $i); printf("\n") }' input-file
Print the first field. Then, print each following field (preceded by the field separator) if it does not match the first field. (You can also use if (tolower($i) != tolower($1)) if you want a case-insensitive comparison.) End the record with a newline.
1 members found this post helpful.
Old 12-08-2011, 02:07 PM   #5
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
while read first rest; do
    echo $first ${rest//$first/}
done< <(sed -r 's/^([^ ]+) (.*)$/\1 \2/')
use as a filter, with stdin:

me@localhost:~$ script < text.txt

Last edited by Juako; 12-08-2011 at 07:17 PM. Reason: sed expr
Old 12-08-2011, 04:45 PM   #6
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147

Just for fun, I composed this without looking at any other responses. Surely someone else already did it better, but let's see.

test$ awk '{keyword = $1; record = keyword; position = 1; while (position++ < NF) {if ($position != keyword) {record = record FS $position}} print record}' input_file.txt
meat beef
flavor vanilla chocolate
color blue brown green red yellow
music classical jazz
wrench socket
vegetable potato broccoli carrot
automobile mercedes benz toyota rolls royce
Looks like Nominal Animal's solution #4 resembles my own, though preferring for over while and printing the new record one field at a time. The bit about tolower(...) applies to my solution as well, of course

Last edited by Telengard; 12-08-2011 at 04:55 PM.
Old 12-08-2011, 09:01 PM   #7
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
This sed also seems to work:

sed -r ':a;s/^([^ ]+) (.*) \1(.*)$/\1 \2\3/g;ta' txt
It will loop on the line until all ocurrences of the first word except that first occurrence are removed from it, then continue to the next line.

edit: the above will fail if you have lines composed only by two or more equal words. To cope with that situation use:

sed -r ':a;s/^([^ ]+) (.*)\1(.*)$/\1 \2\3/g;ta;s/[ ]+/ /g' txt
What this does is "not assume that a duplicate of the first word will be preceded by non-duplicate content plus a space", it just groups everything that may exist before the duplicate (including a possible extra space). The eventual extra spaces in the replaced line are removed in the second 's' expression.

Last edited by Juako; 12-08-2011 at 09:18 PM.


awk, bash, perl, sed

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
break line into words nushki Programming 16 12-04-2009 09:15 AM
xorg configuration to eliminate scan line artifacts while using TV-out Linux - Software 0 12-15-2005 07:35 PM
i have a warning that i want to eliminate in the folowing line: ... grupoapunte Programming 2 05-08-2005 08:33 PM
BASH: First words in a line JordanH Programming 7 10-24-2004 10:00 AM

All times are GMT -5. The time now is 03:42 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration