Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Set order
#1
Hi,

I am trying to understand the behaviour for sets as I am learning Python atm.

My understanding is that sets are unordered but when converting the alphabet as a string to a set it always takes up the same order.

alpha = 'abcdefghijklmnopqrstuvwxyz'
print(set(alpha))

pangram = 'The quick brown fox jumps over the lazy dog'
pangram = pangram.replace(" ","")
pangram = pangram.lower()
print(set(pangram))
Output:
{'l', 'w', 'e', 'd', 'u', 's', 'r', 'i', 'f', 'z', 'a', 'o', 'c', 'k', 'x', 'j', 'n', 'p', 'q', 'y', 'h', 'b', 'g', 'm', 'v', 't'} {'l', 'w', 'e', 'd', 'u', 's', 'r', 'i', 'f', 'z', 'a', 'o', 'c', 'k', 'x', 'q', 'n', 'j', 'p', 'y', 'h', 'b', 'g', 'm', 'v', 't'}
Can anyone explain why the above output is always the same order.

Any info I search says not to rely on sets being in any particular order. The current tutorial I am following is using sets to compare the set(aplhabet) and set(pangram) to see if they are the same.

So I am confused as to why the set is used if we shouldn't expect the same order in sets.

Thanks.

Edit:
Just noticed right after posting that they are not the same order. So basically a set will compare in no particular order Blush
Reply
#2
As odd as this may sound, it is faster to compare sets BECAUSE they are unordered. Because sets are unordered, they can be optimized to access items by value (actually their hash value) instead of index. Kind of liked a lean dictionary without the value mapping. It is much faster to test that "u" is in set(alpha) than alpha.

You cannot use sets to compare collections when order is important. Obviously panagram != alpha, but they do have the same letters so set(alpha) == set(panagram).
Reply
#3
Quote:but when converting the alphabet as a string to a set it always takes up the same order.
Correct. Sets are hashed (look it up), so they are in hash table order. And you possibly want difference and not == for the two sets.
Reply
#4
(Jan-09-2023, 06:08 PM)woooee Wrote:
Quote:but when converting the alphabet as a string to a set it always takes up the same order.
Correct. Sets are hashed (look it up), so they are in hash table order. And you possibly want difference and not == for the two sets.
Actually incorrect. Sets are in hash table order, but hash table order is not fixed, or at least not predictable. I ran this code multiple times:
a = set("ABC")
b = set("abc".upper())
c = set("CBA")
print(a, b, c)
Output:
{'B', 'C', 'A'} {'B', 'C', 'A'} {'B', 'C', 'A'} {'A', 'B', 'C'} {'A', 'B', 'C'} {'A', 'B', 'C'} {'B', 'A', 'C'} {'B', 'A', 'C'} {'C', 'A', 'B'}
Reply
#5
Come on, those are not the same. The two sets that are the same are in the same order. The third set begins with "C" i.e. god and dog are not the same word. And how do you print 3 sets of 3 each with one pass through. Quit nitpicking.
Reply
#6
Sometimes the set "order" is the same, regardless of order that items are added to the set. Sometimes the set "order" is different. Running the same program different times results in different set orders. In other words, there is no predictable order. How is that nitpicking?
Reply
#7
(Jan-09-2023, 06:21 PM)deanhystad Wrote: Sets are in hash table order, but hash table order is not fixed, or at least not predictable. I ran this code multiple times:

Set elements in cpython are printed in hash table order unless there is a hash collision between the elements. Then for elements in the same bucket, it prints in insertion order (or perhaps reverse insertion order?)

For a small set like this, there's probably only 8 or 16 hash buckets. So sometimes the "C" and "A" objects are assigned IDs that hash to the same bucket. When that happens, the set("ABC") and set("CBA") will print in a different order.

This is more easily demonstrated with small integers, which have fixed hash IDs. When they don't collide, the print order is fixed. When they do collide, print order changes.

>>> print(set([11, 3]), set([3, 11])) # 3 and 11 share the same final 3 bits
{3, 11} {11, 3}
>>> print(set([12, 3]), set([3, 12])) # 3 and 12 differ in the final 3 bits.
{3, 12} {3, 12}
Reply
#8
I suggest to read documentation Set types:

Quote:A set object is an unordered collection of distinct hashable objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in dict, list, and tuple classes, and the collections module.)

Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020