Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
05-17-2008, 06:30 PM
|
#1
|
|
LQ Newbie
Registered: May 2008
Posts: 6
Rep:
|
python list.sort() buggy?
this is an excerpt from a sorted list in python:
'504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386'
so apparently it is a bit buggy? it is taken from a big dataset(10K+ elements).
i have converted it into a set and then back into a list to make all elements unique.
could that have something to do with it?
|
|
|
|
05-17-2008, 08:26 PM
|
#2
|
|
Senior Member
Registered: Aug 2002
Location: Groningen, The Netherlands
Distribution: ubuntu
Posts: 2,530
Rep: 
|
Quote:
Originally Posted by notnor
i have converted it into a set and then back into a list to make all elements unique.
could that have something to do with it?
|
Yes. set's use the hash of the elements. Using hashes behind the scenes makes it fast. On the downside hashing implies messing up the order..
Call sort() after converting to a list. Or, if this will slow your program down too much because you would need to sort much more often, you could keep a set alongside to the list which you use only for checking uniqueness.
Note that dict's also use hashes, and also mess up the order of the dict-keys.
I tried to make a class for such a unique string list. I hardly used it myself and it may need improvement, but it works. It uses a dict to keep uniqueness to make ik compatible with earlier python's, but the idea is the same as using a set alongside the list. Here's the code:
Code:
## public domain ##
class StringItem(str):
def __init__(self, s):
str.__init__(self, s)
self.index = None
self.refcount = 1
class UniqStringList:
'''A indexed list of strings in which no duplicates are stored.
When a string value is added that already exists in the list, it will be
stored at the index of the first string that had this value.
A reference count is maintained for the string values, so if a certain
string value is added 3 times, it will be deleted from the list when it is
removed 3 times.
'''
def __init__(self, sequence=None):
'''Create new UniqStringList object
UniqStringList -> new empty UniqStringList.
UniqStringList(sequence) -> new UniqStringList initialized from
sequence's items.
'''
self._dict = {}
self._list = []
if sequence is not None:
for item in sequence:
self.add(item)
def __delete(self, item):
item.refcount -= 1
idx = item.index
if item.refcount <= 0:
del self._dict[item]
del self._list[idx]
for s in self._list[idx:]:
s.index -= 1
def __str__(self):
result = ''
for s in self:
result = '%s%s\n' % (result, s)
return result[:-1] # remove last newline
def refcount(self, s):
try:
if isinstance(s, str): return self._dict[s].refcount
elif isinstance(s, int): return self._list[s].refcount
else: raise TypeError
except (KeyError, IndexError): return 0
def __getitem__(self, s):
'''Returns:
- The index where s is stored if s is a string.
- The string with index s if s is an integer.
'''
if isinstance(s, str): return self._dict[s]
elif isinstance(s, int): return self._list[s]
else: raise TypeError
def __delitem__(self, s):
self.remove(s)
def __len__(self):
return len(self._list)
def __iter__(self):
return self._list.__iter__()
def add(self, s):
'''Adds a string value to the UniqStringList.
Returns: index of the string value where it was stored in the list.
'''
item = self._dict.get(s)
if item == None: # not in list yet
item = StringItem(s)
item.index = len(self._list)
self._list.append(item)
self._dict[s] = item
return item.index
else: # value already in list
item.refcount += 1
return item.index
def remove(self, s):
'''Removes a string for the list by decreasing the reference count.
Only if the reference count reaches zero the string is really deleted.
s can be:
a integer -> the string with index s is deleted.
a string -> the string s is deleted.
'''
if isinstance(s, str): self.__delete(self._dict[s])
elif isinstance(s, int): self.__delete(self._list[s])
else: raise TypeError
def show(self):
'''returns string containing the list with index and reference counts'''
result = ''
for s in self:
item = self._dict[s]
result = '%s%d %s %d\n' % (result, item.index, s, item.refcount)
return result[:-1] # remove last newline
Last edited by Hko; 05-17-2008 at 08:36 PM.
Reason: added the code
|
|
|
|
05-17-2008, 08:50 PM
|
#3
|
|
LQ Newbie
Registered: May 2008
Posts: 6
Original Poster
Rep:
|
the algorithm is:
1. convert to set
2. convert back to list
3. sort
so it is not sorted and then to set and back to list.
so that should sort it perfectly no? is there any other wy to make a list unique?
|
|
|
|
05-18-2008, 12:29 AM
|
#4
|
|
Member
Registered: Oct 2003
Distribution: Archlinux
Posts: 147
Rep:
|
It's not buggy that's sorted in correct ASCII order. If you want a numeric sort you can either convert the strings to integers then sort them, or use "int" as the key when sorting.
Code:
>>> mylist=['504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386']
>>> sorted(mylist)
['504620', '507603', '508727', '51334', '514495', '515436', '519684', '520386']
>>> sorted(mylist, key=int)
['51334', '504620', '507603', '508727', '514495', '515436', '519684', '520386']
There are a few approaches to making a list unique. If you want to preserve the original sort order.
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52560
That link has a few of them.
If you want a simple, easy to understand one.. maybe something like this?
Code:
def deldup(it):
seen = set()
out = []
for x in it:
if x not in seen:
out.append(x)
seen.add(x)
return out
This isn't the most efficient way of doing this but might be the easiest to understand.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 05:45 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|