LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 09-19-2018, 06:13 AM   #1
mikudo
Member
 
Registered: Aug 2018
Posts: 77

Rep: Reputation: Disabled
sort and uniq


I'm piping a list of words around that looks something like this

do
done
every
done
do
very-do
done-every
done
every
do

I want to remove the duplicates.

Both sort -u and uniq are supposed to do this, but on this column of words they can neither sort according to alpha, string value, or properly count.

Uniq -c gives me the number 1 next to each of these words even though they are clearly identical. I've done a ton of sed and tr to remove any leading spaces and it is really only these words and the invisible newline.

What am I missing? (bash shell, debian)
 
Old 09-19-2018, 06:22 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,473

Rep: Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086Reputation: 3086
Seems just like your other "sort and uniq -c" thread.

As you were told, uniq does not work on unsorted data. If sort isn't working the data are different.
 
Old 09-19-2018, 06:26 AM   #3
wpeckham
Senior Member
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, Fedora, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, Vsido, tinycore, Q4OS
Posts: 3,189

Rep: Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380Reputation: 1380
I presume you have these words in a file.
Have you examined the file looking for hidden characters?
Code:
cat -vte words.txt
Have you tried sorting them first then piping that through uniq
ala
Code:
cat words.txt|sort|sort -u
What have you done to examine the issue yourself before posting here?
 
Old 09-19-2018, 06:57 AM   #4
mikudo
Member
 
Registered: Aug 2018
Posts: 77

Original Poster
Rep: Reputation: Disabled
Thank you for responding.

I have been using sed and tr to try and remove leading and trailing spaces and I have looked at all of the options for sort and uniq trying to find other keys to sort on that might illuminate what is not visible, like by translating all words into numbers, and the effect is the same. And I have read a lot of sort uniq howtos and what is suggested there is not working on this. I'm really puzzled why this doesn't work and I am honestly perplexed by the issue as well as the whole 'hidden characters' thing. I want to understand it better.

These words are in a variable in a chain of pipes with sed removing leading spaces, tr replacing newlines with spaces, and I tried to sort | uniq -u and visa versa numerous ways before posting.

cat -vte shows a $ at the end of every line but nothing that should make the duplicate lines unique from. right? if I remove the $ then it puts them all back into a flat string row. every other time ive used sort and uniq its been on a column with newlines.

Why does sort not put these in alphabetical order?

Why does uniq find no duplicates?
 
Old 09-19-2018, 07:08 AM   #5
mikudo
Member
 
Registered: Aug 2018
Posts: 77

Original Poster
Rep: Reputation: Disabled
....| cat -vte | sort | uniq -c

output:

1 do$
1 done$
1 every$
1 done$
1 do$
1 very-do$
1 done-every$
1 done$
1 every$
1 do$

When the most basic thing doesnt work that is the world telling you that you dont know something that you should know. So what is it that I'm missing?

cat -vte is useful and cool thank you for teaching me this.

(btw I'd like to use the code tags but when I click on the buttons in my browser, nothing happens, no tooltips either so I'm not sure those are the right buttons even, javascript allowed, blocker off also, FF)
 
Old 09-19-2018, 07:40 AM   #6
mikudo
Member
 
Registered: Aug 2018
Posts: 77

Original Poster
Rep: Reputation: Disabled
Ok it has to be that I put it into a variable instead of a file right? sort sees variable input $x as a single thing, that must be it.

testing...
 
Old 09-19-2018, 07:56 AM   #7
mikudo
Member
 
Registered: Aug 2018
Posts: 77

Original Poster
Rep: Reputation: Disabled
Confirmed, this was it, closing.

Today I did learned something that I will be able to use in the future.

Thanks guys!
 
Old 09-22-2018, 04:56 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 1,285

Rep: Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590
Always "quote" variables in command arguments: "$x"
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sort and uniq conundrum BFCsaus Linux - Newbie 4 03-30-2012 05:08 AM
[SOLVED] using sort and uniq in bash bibiki Linux - Newbie 2 02-19-2011 11:12 AM
history |tr '\011' ' ' |tr -s " "| cut -d' ' -f3 |sort |uniq -c |sort -nbr |head -n10 alan_ri General 12 12-04-2010 10:01 PM
[SOLVED] bash - merging strings (perhaps with sort | uniq) cmbouchard Linux - Newbie 4 11-17-2010 12:21 AM
sort & uniq tostay2003 Programming 3 06-28-2008 07:14 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration