LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 09-09-2009, 03:47 PM   #1
caponewgp
LQ Newbie
 
Registered: Aug 2009
Posts: 14

Rep: Reputation: 0
Find Duplicate Files


I was wondering what command should I use to find duplicate names in a file. This is a descent sized file and I need to make sure that there are no duplicate names throughout the whole file. Thanks again for any help. Here is was I have so far and Im going to try using the sort command.

cut -f5 -d: /etc/passwd |

Last edited by caponewgp; 09-09-2009 at 04:11 PM.
 
Old 09-09-2009, 04:01 PM   #2
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
You'll need to sort the data in the file first, including the --unique flag for sort, or just sort it with no options and pipe it into uniq and use its -u flag.

It'd help if you gave a sample of the input file as it might need to be sanitised before either solution will work.
 
Old 09-09-2009, 04:03 PM   #3
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,012
Blog Entries: 5

Rep: Reputation: 787Reputation: 787Reputation: 787Reputation: 787Reputation: 787Reputation: 787Reputation: 787
You can use the sort command with -u option to get unique records from a file.

Say you have a file called "list" with the following contents:
billy
bob
john
bob
ralph

You can see bob is in there twice. If you run "sort -u list" it will show only:
billy
bob
john
ralph

You could redirect that into a new file move it over the original.

Of course if you don't have the entire line the same on every occurrence the sort -u won't exclude it. So if your list had:
bob 10
bob 20
Both lines would be output because the second field is different.
 
Old 09-09-2009, 04:15 PM   #4
caponewgp
LQ Newbie
 
Registered: Aug 2009
Posts: 14

Original Poster
Rep: Reputation: 0
Im using this command
cut -f5 -d: /etc/passwd |

Well the file contains only names but and I just want to make sure there are no duplicates. I dont think the sort command would work because the duplicates would still be there it just wouldnt show them. I want to see if there is any so I can later delete the duplicates.
 
Old 09-09-2009, 04:24 PM   #5
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Please don't edit your initial post after people have responded; it breaks the flow of the thread and makes it hard to decipher what's happened.

I don't think sort can do this on its own, but if you use uniq with the -d flag, you'll get the output you desire::
Code:
cut -f5 -d: /etc/passwd | sort | uniq -d
 
Old 09-09-2009, 04:30 PM   #6
tredegar
LQ 5k Club
 
Registered: May 2003
Location: London, UK
Distribution: Debian "Jessie"
Posts: 6,017

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Im using this command
cut -f5 -d: /etc/passwd |
Yes, you told us this in your first post.

Did you read jlightner's post at #3 Or pwc101's reply at #2 ?

I don't think you have explained your question properly.

Please try again.
 
Old 09-09-2009, 04:31 PM   #7
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by tredegar View Post
Yes, you told us this in your first post.
That information was added in the edit; it wasn't there when I replied.
 
Old 09-09-2009, 10:20 PM   #8
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,986
Blog Entries: 11

Rep: Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880Reputation: 880
Or that little awk-proggie (shamelessly snaffled from awks documentation):
Code:
# remove duplicate lines from unsorted data, e.g. history files,
# firewall rules, that kind of stuff
{
  if (data[$0]++ == 0)
    lines[++count] = $0
}
END {
  for (i = 1; i <= count; i++)
    print lines[i]
}
Its nicest feature is that it doesn't change the order in which
the records originally appear.
 
Old 09-09-2009, 10:54 PM   #9
estabroo
Senior Member
 
Registered: Jun 2008
Distribution: debian, ubuntu, sidux
Posts: 1,094
Blog Entries: 2

Rep: Reputation: 111Reputation: 111
A better question is can the /etc/passwd file actually contain duplicate login names? I don't think it can, unless added manually, duplicate uid/gid sure, but I don't think the useradd/adduser command will let you dup the name.
 
Old 09-10-2009, 12:20 AM   #10
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
It does allow duplicate login names. Remember that that names are only for humans, under the skin it's all done with nums (uid/gid).
Duplicate uids aren't recommended, but it can be done...
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Software to find duplicate files mike_savoie Linux - Software 5 07-17-2010 03:04 PM
Need a certain program to find true duplicate files On2ndThought Programming 11 06-03-2008 06:26 PM
I need a GUI that can find duplicate files davidguygc Linux - Software 2 05-17-2007 05:54 AM
Script to find duplicate files within one or more directories peter88 Linux - General 6 12-10-2006 05:17 AM
Howto find duplicate files js72 Linux - Software 1 11-09-2003 04:55 AM


All times are GMT -5. The time now is 11:06 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration