Find Duplicate Files

caponewgp · 09-09-2009, 03:47 PM

I was wondering what command should I use to find duplicate names in a file. This is a descent sized file and I need to make sure that there are no duplicate names throughout the whole file. Thanks again for any help. Here is was I have so far and Im going to try using the sort command.

cut -f5 -d: /etc/passwd |

pwc101 · 09-09-2009, 04:01 PM

You'll need to sort the data in the file first, including the --unique flag for sort, or just sort it with no options and pipe it into uniq and use its -u flag.

It'd help if you gave a sample of the input file as it might need to be sanitised before either solution will work.

MensaWater · 09-09-2009, 04:03 PM

You can use the sort command with -u option to get unique records from a file.

Say you have a file called "list" with the following contents:
billy
bob
john
bob
ralph

You can see bob is in there twice. If you run "sort -u list" it will show only:
billy
bob
john
ralph

You could redirect that into a new file move it over the original.

Of course if you don't have the entire line the same on every occurrence the sort -u won't exclude it. So if your list had:
bob 10
bob 20
Both lines would be output because the second field is different.

caponewgp · 09-09-2009, 04:15 PM

Im using this command
cut -f5 -d: /etc/passwd |

Well the file contains only names but and I just want to make sure there are no duplicates. I dont think the sort command would work because the duplicates would still be there it just wouldnt show them. I want to see if there is any so I can later delete the duplicates.

pwc101 · 09-09-2009, 04:24 PM

Please don't edit your initial post after people have responded; it breaks the flow of the thread and makes it hard to decipher what's happened.

I don't think sort can do this on its own, but if you use uniq with the -d flag, you'll get the output you desire::

Code:

cut -f5 -d: /etc/passwd | sort | uniq -d

tredegar · 09-09-2009, 04:30 PM

Quote:

Im using this command
cut -f5 -d: /etc/passwd |

Yes, you told us this in your first post.

Did you read jlightner's post at #3 Or pwc101's reply at #2 ?

I don't think you have explained your question properly.

Please try again.

pwc101 · 09-09-2009, 04:31 PM

Quote:

Originally Posted by tredegar

Yes, you told us this in your first post.

That information was added in the edit; it wasn't there when I replied.

Tinkster · 09-09-2009, 10:20 PM

Or that little awk-proggie (shamelessly snaffled from awks documentation):

Code:

# remove duplicate lines from unsorted data, e.g. history files,
# firewall rules, that kind of stuff
{
  if (data[$0]++ == 0)
    lines[++count] = $0
}
END {
  for (i = 1; i <= count; i++)
    print lines[i]
}

Its nicest feature is that it doesn't change the order in which
the records originally appear.

estabroo · 09-09-2009, 10:54 PM

A better question is can the /etc/passwd file actually contain duplicate login names? I don't think it can, unless added manually, duplicate uid/gid sure, but I don't think the useradd/adduser command will let you dup the name.

chrism01 · 09-10-2009, 12:20 AM

It does allow duplicate login names. Remember that that names are only for humans, under the skin it's all done with nums (uid/gid).
Duplicate uids aren't recommended, but it can be done...