LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-15-2022, 07:09 AM   #1
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,953

Rep: Reputation: 270Reputation: 270Reputation: 270
Is there a program that removes non-adjacent duplicate lines?


I have a file of lines that I want to keep in a non-sorted (as far as any Unix app can tell) order but remove duplicates.
 
Old 11-15-2022, 07:25 AM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,597

Rep: Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545Reputation: 2545

Yes, there's lots of ways: https://duckduckgo.com/?q=linux+remove+duplicate+lines+without+sorting

 
3 members found this post helpful.
Old 11-15-2022, 07:33 AM   #3
slacker_et
Member
 
Registered: Dec 2009
Distribution: Slackware
Posts: 138

Rep: Reputation: 27
Sounds like you are looking for a "uniq" command that does not require source file to be sorted and is not interactive.
I do not think there is such a command.
However; in the past I dabbled with using this Windows based program running under Wine: WinMerge

--ET
 
Old 11-15-2022, 07:35 AM   #4
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,367

Rep: Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748Reputation: 2748
Code:
awk '++dups[$0] == 1' <filename>
although the reverse logic in @boughtonp link is cute
 
1 members found this post helpful.
Old 11-15-2022, 07:36 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,294
Blog Entries: 3

Rep: Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719Reputation: 3719
You could do it with AWK and an associative array keyed on the contents of each line. I'm not sure how well that would scale though. How large a text file are you considering?
 
Old 11-15-2022, 07:38 AM   #6
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,780

Rep: Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198Reputation: 1198
Most resources suggest the ulta-short
Code:
awk '!x[$0]++'
More explicit is
Code:
awk '!($0 in x){x[$0]; print}'
You can append a file name, otherwise it reads stdin.

Last edited by MadeInGermany; 11-16-2022 at 05:59 AM.
 
1 members found this post helpful.
Old 11-15-2022, 11:46 AM   #7
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,642
Blog Entries: 4

Rep: Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933Reputation: 3933
While I will not now write the "necessary one-liner" for you, the algorithm essentially is this:

• Produce a version of the input file which contains a "record number" field to the left of the record's contents.
• Sort the resulting file by the second field: the actual content
• Now that the "duplicate values" are adjacent, considering only the second field remove all but the first occurrence. This is trivial now, because you need only consider the current value against its immediate predecessor.
• Re-sort the resulting file by the first ("record number") field.
• Remove the "record number" field to produce the final result.

Fifty years ago, they did this with magnetic tapes. It may well be that they did it earlier using punched cards.

Last edited by sundialsvcs; 11-15-2022 at 11:48 AM.
 
Old 11-15-2022, 02:38 PM   #8
Keith Hedger
Senior Member
 
Registered: Jun 2010
Location: Wiltshire, UK
Distribution: Void, Linux From Scratch, Slackware64
Posts: 3,150

Rep: Reputation: 856Reputation: 856Reputation: 856Reputation: 856Reputation: 856Reputation: 856Reputation: 856
If you dont mind the file being sorted use
Code:
sort -u /path/to/file

Last edited by Keith Hedger; 11-15-2022 at 02:39 PM. Reason: whoops!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] force to grep adjacent words udiubu Linux - Newbie 2 07-04-2018 12:30 PM
Why does open office put lines into adjacent cell when I am splitting one cell 1sweetwater! Linux - Software 1 12-03-2014 01:36 PM
Want to merge partitions, but they are not adjacent westcoastlinux Linux - Hardware 3 09-17-2011 12:28 PM
Exim removes duplicate adresses from virtual domains Thomas1234 Linux - Server 0 11-26-2009 08:55 AM
icewm, how to go to adjacent workspace when window moving at the edge ? frenchn00b Linux - Desktop 13 01-19-2008 07:48 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 05:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration