[SOLVED] sort and uniq

mikudo · 09-19-2018, 05:13 AM

I'm piping a list of words around that looks something like this

do
done
every
done
do
very-do
done-every
done
every
do

I want to remove the duplicates.

Both sort -u and uniq are supposed to do this, but on this column of words they can neither sort according to alpha, string value, or properly count.

Uniq -c gives me the number 1 next to each of these words even though they are clearly identical. I've done a ton of sed and tr to remove any leading spaces and it is really only these words and the invisible newline.

What am I missing? (bash shell, debian)

syg00 · 09-19-2018, 05:22 AM

Seems just like your other "sort and uniq -c" thread.

As you were told, uniq does not work on unsorted data. If sort isn't working the data are different.

wpeckham · 09-19-2018, 05:26 AM

I presume you have these words in a file.
Have you examined the file looking for hidden characters?

Code:

cat -vte words.txt

Have you tried sorting them first then piping that through uniq
ala

Code:

cat words.txt|sort|sort -u

What have you done to examine the issue yourself before posting here?

mikudo · 09-19-2018, 05:57 AM

Thank you for responding.

I have been using sed and tr to try and remove leading and trailing spaces and I have looked at all of the options for sort and uniq trying to find other keys to sort on that might illuminate what is not visible, like by translating all words into numbers, and the effect is the same. And I have read a lot of sort uniq howtos and what is suggested there is not working on this. I'm really puzzled why this doesn't work and I am honestly perplexed by the issue as well as the whole 'hidden characters' thing. I want to understand it better.

These words are in a variable in a chain of pipes with sed removing leading spaces, tr replacing newlines with spaces, and I tried to sort | uniq -u and visa versa numerous ways before posting.

cat -vte shows a $ at the end of every line but nothing that should make the duplicate lines unique from. right? if I remove the $ then it puts them all back into a flat string row. every other time ive used sort and uniq its been on a column with newlines.

Why does sort not put these in alphabetical order?

Why does uniq find no duplicates?

mikudo · 09-19-2018, 06:08 AM

....| cat -vte | sort | uniq -c

output:

1 do$
1 done$
1 every$
1 done$
1 do$
1 very-do$
1 done-every$
1 done$
1 every$
1 do$

When the most basic thing doesnt work that is the world telling you that you dont know something that you should know. So what is it that I'm missing?

cat -vte is useful and cool thank you for teaching me this.

(btw I'd like to use the code tags but when I click on the buttons in my browser, nothing happens, no tooltips either so I'm not sure those are the right buttons even, javascript allowed, blocker off also, FF)

mikudo · 09-19-2018, 06:40 AM

Ok it has to be that I put it into a variable instead of a file right? sort sees variable input $x as a single thing, that must be it.

testing...

mikudo · 09-19-2018, 06:56 AM

Confirmed, this was it, closing.

Today I did learned something that I will be able to use in the future.

Thanks guys!

MadeInGermany · 09-22-2018, 03:56 AM

Always "quote" variables in command arguments: "$x"