LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 02-13-2019, 12:22 PM   #1
perfection
Member
 
Registered: Nov 2015
Distribution: Slackware64-Current
Posts: 58

Rep: Reputation: Disabled
Unhappy Sort with BUG?


I use the command several times to sort files and remove duplicates, but doing so with a file of 192,300 Lines it left many duplicate lines! However running this command on small files of type 1000 lines this never happened!
Code:
sort -u List -o List-End
I also tried these other commands but it always happens the same
Code:
sort List | uniq > List2.txt

awk '!i[$0]++' < List > List2
Anyone know to give some explanation?
Or any idea how I can remove the duplicates from my file?
 
Old 02-13-2019, 12:39 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Have you checked the duplicates for extraneous white space? That gets counted in both sort and uniq.

Edit:

Code:
sort list | sed -r 's/[[:space:]]+$//' | uniq -c > list2

Last edited by Turbocapitalist; 02-13-2019 at 12:42 PM.
 
1 members found this post helpful.
Old 02-13-2019, 12:47 PM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,838

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
yes, if you can't post duplicated lines you need to compare them yourself and find the differences. Probably that is whitespace or some other tricky char.
You can remove almost 99% of that big file just keep some lines which look similar (after a sort - with or without -u).
 
1 members found this post helpful.
Old 02-13-2019, 04:14 PM   #4
perfection
Member
 
Registered: Nov 2015
Distribution: Slackware64-Current
Posts: 58

Original Poster
Rep: Reputation: Disabled
Question

I noticed that sort sorts the files differently than the Atom Editor
I noticed that by ordering the file by Atom Editor the command below also works.
Code:
uniq List > List-uniq
Using this command below informed by @Turbocapitalist
I have verified that the final line qtd is the same as the work done by Atom Editor or the previous command that I reported
Code:
sort List | sed -r 's/[[:space:]]+$//' | uniq > List2
However when comparing this file in Kompare it says they are different files, it sees all the lines as a single line, making it difficult to understand the difference between the 2 generated files

I did not understand this sed command. If you can explain? I did not understand what he does.
Code:
sed -r 's/[[:space:]]+$//'
Sorting Sort, Example:
###A9AdsMiddleBoxTop
###A9AdsOutOfStockWidgetTop
###A9AdsServicesWidgetTop
###AD-300x250
###AD-300x250-1
###AD-300x250-2
###AD-300x250-3
###AD-HOME-LEFT

Sorting Atom Editor, Example:
_.gif?ref=
_.gif?t=
_03100x07696.$image,domain=pcwelt.de
_100_ad.
_100x480_
_115x220.
_120_60.
_120_600_

Images to facilitate
Attached Thumbnails
Click image for larger version

Name:	Kompare.png
Views:	35
Size:	83.2 KB
ID:	29789   Click image for larger version

Name:	Atom Editor Sort.png
Views:	33
Size:	107.2 KB
ID:	29790   Click image for larger version

Name:	Sort.png
Views:	26
Size:	141.9 KB
ID:	29791  
 
Old 02-14-2019, 12:19 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,307
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
Quote:
Originally Posted by perfection View Post
I did not understand this sed command. If you can explain? I did not understand what he does.
Code:
sed -r 's/[[:space:]]+$//'
The -r tells it to use Extended Regular Expressions, which are a bit easier to read and have more choices.

The s/ / / is a substitution command.

The [[:something:]] is a character class. The one above covers all white space, including tabs, and the plus means check for one or more of them. Since it is immediately followed by a dollar sign $, which stands for the end of the line, it means seach for one or more white spaces at the end of the line. So it will zap spaces at the end of the line but leave them alone elsehwere in the line.

See the following for the authoritative answer.

Code:
man sed
man 7 regex
 
Old 02-14-2019, 04:15 AM   #6
l0f4r0
Member
 
Registered: Jul 2018
Location: Paris
Distribution: Debian
Posts: 900

Rep: Reputation: 290Reputation: 290Reputation: 290
@perfection: so are the following your 2 remaining issues now?
  • sort in Atom Editor gives different output than GNU sort.
    >> What command did you use exactly in Atom editor? Because depending on the packages installed, it seems that there can be different kind of sortings (https://github.com/atom/sort-lines)
  • Kompare tells you files are different.
    >> What about vimdiff or diff?
 
  


Reply

Tags
awk, duplicate, sort



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Can I use GNU sort to sort one field in order, another in reverse? zombieite Linux - Newbie 4 04-27-2009 12:23 AM
php sort help - sort numerical descending then by alphabetical? RavenLX Programming 3 03-11-2009 08:35 AM
How do I do filtering in Perl (keep sort order and sort again by another means)? RavenLX Programming 9 12-19-2008 10:12 AM
selection sort compiles but does not sort the array as desired ganesha Programming 2 04-20-2008 07:44 AM
Is there a line limit with the sort utility? Trying to sort 130 million lines of text gruffy Linux - General 4 08-10-2006 08:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration