LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 12-08-2011, 10:20 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Ubuntu
Posts: 1,065

Rep: Reputation: 284Reputation: 284Reputation: 284
Linux command(s) to eliminate redundant words in a line


The input file is text of this form:
meat beef
flavor vanilla flavor chocolate
color blue color brown color green color red color yellow
music classical music jazz
wrench socket
vegetable potato vegetable broccoli vegetable carrot
automobile mercedes benz automobile toyota automobile rolls royce

The objective is to eliminate the redundant words (if any).
The key word is always the first word in the record.
However the redundant words are not always in positions 3, 5, 7, etc.

Desired output file:
meat beef
flavor vanilla chocolate
color blue brown green red yellow
music classical jazz
wrench socket
vegetable potato broccoli carrot
automobile mercedes benz toyota rolls royce

Intuition points to sed but the syntax baffles me.

Please advise.
 
Old 12-08-2011, 10:40 AM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,630
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Something like
Code:
while read -r line; do
    echo $line | sed 's/$(echo $line | grep '[a-zA-Z]* ' -o)//g'
done < text_file
Completely untested, I'm afraid, I'm at work on Windows I can check it later this evening.

Last edited by Snark1994; 12-08-2011 at 10:41 AM.
 
Old 12-08-2011, 10:48 AM   #3
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
A Perl onliner:
Code:
perl -pae '{%s=();$_=join " ",(grep{!$s{$_}++}@F)."\n"}' input_file
Surelly there is simpler way
 
Old 12-08-2011, 11:01 AM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Cedrik's Perl one-liner is certainly more compact, but I think an awk oneliner would be easier to grok:
Code:
awk '{ printf("%s", $1) ; for (i = 2; i <= NF; i++) if ($i != $1) printf(" %s", $i); printf("\n") }' input-file
Print the first field. Then, print each following field (preceded by the field separator) if it does not match the first field. (You can also use if (tolower($i) != tolower($1)) if you want a case-insensitive comparison.) End the record with a newline.
 
1 members found this post helpful.
Old 12-08-2011, 02:07 PM   #5
Juako
Member
 
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
Code:
#!/bin/bash
while read first rest; do
    echo $first ${rest//$first/}
done< <(sed -r 's/^([^ ]+) (.*)$/\1 \2/')
use as a filter, with stdin:

me@localhost:~$ script < text.txt

Last edited by Juako; 12-08-2011 at 07:17 PM. Reason: sed expr
 
Old 12-08-2011, 04:45 PM   #6
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Talking

Just for fun, I composed this without looking at any other responses. Surely someone else already did it better, but let's see.


Code:
test$ awk '{keyword = $1; record = keyword; position = 1; while (position++ < NF) {if ($position != keyword) {record = record FS $position}} print record}' input_file.txt
meat beef
flavor vanilla chocolate
color blue brown green red yellow
music classical jazz
wrench socket
vegetable potato broccoli carrot
automobile mercedes benz toyota rolls royce
test$
EDIT
Looks like Nominal Animal's solution #4 resembles my own, though preferring for over while and printing the new record one field at a time. The bit about tolower(...) applies to my solution as well, of course

Last edited by Telengard; 12-08-2011 at 04:55 PM.
 
Old 12-08-2011, 09:01 PM   #7
Juako
Member
 
Registered: Mar 2010
Posts: 202

Rep: Reputation: 84
This sed also seems to work:

Code:
sed -r ':a;s/^([^ ]+) (.*) \1(.*)$/\1 \2\3/g;ta' txt
It will loop on the line until all ocurrences of the first word except that first occurrence are removed from it, then continue to the next line.

edit: the above will fail if you have lines composed only by two or more equal words. To cope with that situation use:

Code:
sed -r ':a;s/^([^ ]+) (.*)\1(.*)$/\1 \2\3/g;ta;s/[ ]+/ /g' txt
What this does is "not assume that a duplicate of the first word will be preceded by non-duplicate content plus a space", it just groups everything that may exist before the duplicate (including a possible extra space). The eventual extra spaces in the replaced line are removed in the second 's' expression.

Last edited by Juako; 12-08-2011 at 09:18 PM.
 
  


Reply

Tags
awk, bash, perl, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
break line into words nushki Programming 16 12-04-2009 09:15 AM
xorg configuration to eliminate scan line artifacts while using TV-out ojbyer@usa.net Linux - Software 0 12-15-2005 07:35 PM
i have a warning that i want to eliminate in the folowing line: ... grupoapunte Programming 2 05-08-2005 08:33 PM
BASH: First words in a line JordanH Programming 7 10-24-2004 10:00 AM


All times are GMT -5. The time now is 07:46 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration