LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-26-2020, 01:26 AM   #1
Tathastu
LQ Newbie
 
Registered: May 2020
Posts: 5

Rep: Reputation: Disabled
Delete consecutive words from strings


I have a multiple strings like:
Input:
linux-unix-linux-unix-manjaro
I want to remove consecutive words so that output will be:

Output
linux-unix-manjaro

Please someone provide awk or sed based solution or any other.
 
Old 05-26-2020, 01:33 AM   #2
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.8.2003
Posts: 4,828

Rep: Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763
Quote:
Originally Posted by Tathastu View Post
I have a multiple strings like:
Input:
linux-unix-linux-unix-manjaro
I want to remove consecutive words so that output will be:

Output
linux-unix-manjaro

Please someone provide awk or sed based solution or any other.
Your sample doesn’t contain consecutive words, although I understand what you want to do.

What have you tried? What environment are you working in? What languages do you want to use?
We’re happy to help, but won’t “provide...solution”
Pease review the Welcome to LQ link in my sig.

Last edited by scasey; 05-26-2020 at 01:36 AM.
 
1 members found this post helpful.
Old 05-26-2020, 01:36 AM   #3
Tathastu
LQ Newbie
 
Registered: May 2020
Posts: 5

Original Poster
Rep: Reputation: Disabled
I was working in bash.
 
Old 05-26-2020, 02:09 AM   #4
JJJCR
Senior Member
 
Registered: Apr 2010
Posts: 1,754

Rep: Reputation: 311Reputation: 311Reputation: 311Reputation: 311
check out this link: https://superuser.com/questions/5131...in-a-bash-list
or this one: https://shapeshed.com/unix-uniq/

Last edited by JJJCR; 05-26-2020 at 02:22 AM. Reason: edit
 
1 members found this post helpful.
Old 05-26-2020, 02:13 AM   #5
shruggy
Member
 
Registered: Mar 2020
Posts: 895

Rep: Reputation: Disabled
@OP. Look up how to use backreferences in a regular expression. This is probably what you're looking for.

@JJJCR. The problem with the sorting approach is that it will affect the whole input stream, even the unique parts. To take OP's data from the top post as an example, this would yield
Code:
linux-manjaro-unix
instead of
Code:
linux-unix-manjaro
Depending on what the OP wants, this may be tolerable. Or not.

And applying uniq without sort would only remove consecutive duplicate words which is as scasey observed above not what the provided sample shows.

Last edited by shruggy; 05-26-2020 at 06:48 AM.
 
Old 05-26-2020, 06:00 AM   #6
Tathastu
LQ Newbie
 
Registered: May 2020
Posts: 5

Original Poster
Rep: Reputation: Disabled
This method did not work
 
Old 05-26-2020, 06:09 AM   #7
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,885

Rep: Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205
Perhaps you should show us your method, and what results it gave.
 
1 members found this post helpful.
Old 05-26-2020, 09:40 AM   #8
scasey
Senior Member
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.8.2003
Posts: 4,828

Rep: Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763Reputation: 1763
^^Yes. "Did not work" tells us nothing that will help us help you.
Again:
What have you tried? What environment are you working in? What languages do you want to use?
Please review the Welcome to LQ link in my sig.

Last edited by scasey; 05-26-2020 at 09:41 AM.
 
Old 05-26-2020, 09:54 AM   #9
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 22,809

Rep: Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314
Quote:
Originally Posted by Tathastu View Post
I have a multiple strings like:
Input:
linux-unix-linux-unix-manjaro

I want to remove consecutive words so that output will be:

Output
linux-unix-manjaro

Please someone provide awk or sed based solution or any other.
Your question is confusing. You say you want consecutive words....but given your example, it seems you want to output UNIQUE words in that string. What output would you want if you had an input string of:
Code:
linux-unix-unix-unix-centos-linux-fedora-manjaro-unix-linux
???

And again; you need to show us what YOU have written/done/tried...we are not going to do your homework for you.
 
Old 05-26-2020, 10:27 AM   #10
Tathastu
LQ Newbie
 
Registered: May 2020
Posts: 5

Original Poster
Rep: Reputation: Disabled
In this string output will be unchanged because along with consecutive it should remove grouped consecutive words including hyphen or any punctuation marks.
eg

Input: linux-unix-manjaro-linux-unix-unix

grouping similar combo here in this case, linux-unix repeats so

Output: linux-unix-manjaro-unix

Means second occurence of linux-unix be deleted. I think using back referencing with sed or awk may work. But i dont know the syntax.
 
Old 05-26-2020, 10:35 AM   #11
shruggy
Member
 
Registered: Mar 2020
Posts: 895

Rep: Reputation: Disabled
Quote:
Originally Posted by Tathastu View Post
Input: linux-unix-manjaro-linux-unix-unix
Wait, now the occurences are not adjacent at all.

Last edited by shruggy; 05-26-2020 at 10:45 AM.
 
Old 05-26-2020, 11:32 AM   #12
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 22,809

Rep: Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314Reputation: 6314
Quote:
Originally Posted by Tathastu View Post
In this string output will be unchanged because along with consecutive it should remove grouped consecutive words including hyphen or any punctuation marks.
eg

Input: linux-unix-manjaro-linux-unix-unix

grouping similar combo here in this case, linux-unix repeats so

Output: linux-unix-manjaro-unix

Means second occurence of linux-unix be deleted. I think using back referencing with sed or awk may work. But i dont know the syntax.
Understand what you want now. So, back to what you've been asked several times now...post what YOU have written/done/tried on your own. AGAIN, we are not going to do your homework for you.
 
Old 05-26-2020, 11:45 AM   #13
Tathastu
LQ Newbie
 
Registered: May 2020
Posts: 5

Original Poster
Rep: Reputation: Disabled
I have used a awk syntax:

echo linux-unix-manjaro-linux-unix-unix | awk 'BEGIN{RS="[ \n[unct:]]"}{ORS = RT}!_[$0]++'

Obtained output
linux-unix-manjaro

Desired output
linux-unix-manjaro-unix
 
Old 05-26-2020, 07:33 PM   #14
JJJCR
Senior Member
 
Registered: Apr 2010
Posts: 1,754

Rep: Reputation: 311Reputation: 311Reputation: 311Reputation: 311
Smile

Quote:
Originally Posted by shruggy View Post
@OP. Look up how to use backreferences in a regular expression. This is probably what you're looking for.

@JJJCR. The problem with the sorting approach is that it will affect the whole input stream, even the unique parts. To take OP's data from the top post as an example, this would yield
Code:
linux-manjaro-unix
instead of
Code:
linux-unix-manjaro
Depending on what the OP wants, this may be tolerable. Or not.

And applying uniq without sort would only remove consecutive duplicate words which is as scasey observed above not what the provided sample shows.
Hi Shruggy, indeed sort will take it as a literal one string, since there is no space delimiter.

I think what the OP can do is a 3 process.
a. Take out the dashes from the string / sed s can do this
b. Sort with uniq to remove duplicates / The input should be the output of step a
c. Get the output of step b and put back the dashes

I guess that should be the way. My 2 cents.
 
Old 05-26-2020, 08:17 PM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,885

Rep: Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205Reputation: 3205
Quote:
Originally Posted by Tathastu View Post
I have used a awk syntax:
If you wrote that you would know what it is doing and why. If you just cut-and-pasted it from somewhere, you should read the awk documentation. The code as-is will not process multiple records sensibly.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help me!! Urgent - Printing consecutive words nithylaa Linux User Groups (LUG) 6 03-27-2020 10:45 AM
Help me!! Urgent - Printing consecutive words nithylaa Linux User Groups (LUG) 1 03-27-2020 12:29 AM
LXer: Words, Words, Words--Introducing OpenSearchServer LXer Syndicated Linux News 0 08-07-2019 02:13 PM
how i can fetch some strings like words or some words Farah_s Linux - Newbie 6 03-14-2012 01:23 AM
finding and removing duplicate consecutive words cocostaec Linux - Newbie 18 05-07-2011 01:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration