Welcome to the most active Linux Forum on the web.
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 05-17-2008, 06:30 PM   #1
LQ Newbie
Registered: May 2008
Posts: 6

Rep: Reputation: 0
python list.sort() buggy?

this is an excerpt from a sorted list in python:

'504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386'

so apparently it is a bit buggy? it is taken from a big dataset(10K+ elements).

i have converted it into a set and then back into a list to make all elements unique.
could that have something to do with it?
Old 05-17-2008, 08:26 PM   #2
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530

Rep: Reputation: 110Reputation: 110
Originally Posted by notnor View Post
i have converted it into a set and then back into a list to make all elements unique.
could that have something to do with it?
Yes. set's use the hash of the elements. Using hashes behind the scenes makes it fast. On the downside hashing implies messing up the order..

Call sort() after converting to a list. Or, if this will slow your program down too much because you would need to sort much more often, you could keep a set alongside to the list which you use only for checking uniqueness.

Note that dict's also use hashes, and also mess up the order of the dict-keys.

I tried to make a class for such a unique string list. I hardly used it myself and it may need improvement, but it works. It uses a dict to keep uniqueness to make ik compatible with earlier python's, but the idea is the same as using a set alongside the list. Here's the code:
## public domain ##

class StringItem(str):
    def __init__(self, s):
        str.__init__(self, s)
        self.index = None
        self.refcount = 1        

class UniqStringList:
    '''A indexed list of strings in which no duplicates are stored.
    When a string value is added that already exists in the list, it will be
    stored at the index of the first string that had this value.
    A reference count is maintained for the string values, so if a certain
    string value is added 3 times, it will be deleted from the list when it is
    removed 3 times.
    def __init__(self, sequence=None):
        '''Create new UniqStringList object 
        UniqStringList -> new empty UniqStringList.
        UniqStringList(sequence) -> new UniqStringList initialized from
        sequence's items.
        self._dict = {}
        self._list = []
        if sequence is not None:
            for item in sequence:

    def __delete(self, item):
        item.refcount -= 1
        idx = item.index
        if item.refcount <= 0:
            del self._dict[item]
            del self._list[idx]
            for s in self._list[idx:]:
                s.index -= 1

    def  __str__(self):
        result = ''
        for s in self:
            result = '%s%s\n' % (result, s)
        return result[:-1]    # remove last newline
    def refcount(self, s):
            if isinstance(s, str): return self._dict[s].refcount
            elif isinstance(s, int): return self._list[s].refcount
            else: raise TypeError
        except  (KeyError, IndexError): return 0
    def __getitem__(self, s):
         - The index where s is stored if s is a string.
         - The string with index s if s is an integer.
        if isinstance(s, str): return self._dict[s]
        elif isinstance(s, int): return self._list[s]
        else: raise TypeError
    def __delitem__(self, s):
    def __len__(self):
        return len(self._list)
    def __iter__(self):
        return self._list.__iter__()
    def add(self, s):
        '''Adds a string value to the UniqStringList.        
        Returns: index of the string value where it was stored in the list.
        item = self._dict.get(s)
        if item == None:  # not in list yet
            item = StringItem(s)
            item.index = len(self._list)
            self._dict[s] = item
            return item.index
        else:  # value already in list
            item.refcount += 1
            return item.index
    def remove(self, s):
        '''Removes a string for the list by decreasing the reference count.
        Only if the reference count reaches zero the string is really deleted.
        s can be:
            a integer -> the string with index s is deleted.
            a string -> the string s is deleted.
        if isinstance(s, str): self.__delete(self._dict[s])
        elif isinstance(s, int): self.__delete(self._list[s])
        else: raise TypeError
    def show(self):
        '''returns string containing the list with index and reference counts'''
        result = ''
        for s in self:
            item = self._dict[s]
            result = '%s%d %s %d\n' % (result, item.index, s, item.refcount)
        return result[:-1]    # remove last newline

Last edited by Hko; 05-17-2008 at 08:36 PM. Reason: added the code
Old 05-17-2008, 08:50 PM   #3
LQ Newbie
Registered: May 2008
Posts: 6

Original Poster
Rep: Reputation: 0
the algorithm is:
1. convert to set
2. convert back to list
3. sort

so it is not sorted and then to set and back to list.

so that should sort it perfectly no? is there any other wy to make a list unique?
Old 05-18-2008, 12:29 AM   #4
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
It's not buggy that's sorted in correct ASCII order. If you want a numeric sort you can either convert the strings to integers then sort them, or use "int" as the key when sorting.
>>> mylist=['504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386']
>>> sorted(mylist)
['504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386']
>>> sorted(mylist, key=int)
['51334', '504620', '507603', '508727', '514495', '515436', '519684', '520386']
There are a few approaches to making a list unique. If you want to preserve the original sort order.
That link has a few of them.
If you want a simple, easy to understand one.. maybe something like this?
def deldup(it):
    seen = set()
    out = []
    for x in it:
            if x not in seen:
    return out
This isn't the most efficient way of doing this but might be the easiest to understand.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kopete - Sort contact list Wynd Linux - Software 2 06-19-2008 09:08 PM
Python waiting function... sort of? billyseth Programming 1 04-25-2007 05:36 PM
Removing a colon and all characters before it, in a python list tangle Programming 4 09-12-2006 09:48 PM
How can I sort out the first and last of a list of uniques?!?!! vous Programming 8 03-22-2005 09:05 AM
C++, using custom function to sort an std::list R00ts Programming 3 01-03-2005 09:41 AM

All times are GMT -5. The time now is 04:31 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration