Friday, October 23, 2009

Python Dictionary Generators

The dictionary type in Python is what is referred to as an associative array in other programming languages. What this means is that rather than looking up values by position, values may be retrieved by a key value. With either type of array, the values are indexed, meaning that it each value may be referenced individually within the collection of values. Otherwise, we would have nothing but a collection of values that cannot be referenced in any meaningful way.

The dictionary type in Python offers higher-level functionality than most other associative array types found in other languages. This is done by providing an API on top of the primitive operators that exist for traditional style associative arrays. For instance, developers can retrieve all the keys in a given dictionary instance by invoking the keys() method. The value returned by this method is an array, or list in Python terminology, that may be iterated. The Python dictionary API also offers other iterative functionality.

With the introduction of Python 3, the dictionary API has seen some changes. Namely, the methods that return lists in Python 2 now return generators. This is quite different from invoking these methods and expecting a list. For one thing, the return value is not directly indexable. This is because generators do not support indexing. The following example shows an example of how the dictionary API behaves in Python 3.
#Example; Python 3 dictionary keys.

#Initialize dictionary object.
my_dict={"one":1, "two":2, "three":3}

if __name__=="__main__":

#Invoke the dictionary API to instantiate generators.

#Display the generator objects.
print("KEYS: %s"%(my_keys))
print("ITEMS: %s"%(my_items))
print("VALUES: %s"%(my_values))

#This would work in Python 2.
except TypeError:
print("my_keys does not support indexing...")

#This would work in Python 2.
except TypeError:
print("my_items does not support indexing...")

#This would work in Python 2.
except TypeError:
print("my_values does not support indexing...")

#Display the generator output.
print("\nIterating keys...")
for i in my_keys:

print("\nIterating items...")
for i in my_items:

print("\nIterating values...")
for i in my_values:
In this simple example, we start by creating a simple dictionary instance, my_dict. The idea is that this dictionary be a simple one as we aren't interested in the content. Next, we create three new variables, each of which, store some aspect of the my_dict dictionary. The my_keys variable stores all keys that reference the values in the dictionary. The my_items variable stores key-value pairs that make of the dictionary, each item being a tuple. The my_values variable stores the actual values stored in my_dictionary with no regard for which key references them. The important thing to keep in mind here is that these variables derived from the my_dict dictionary were created using the dictionary API.

Up to this point, we have my_dict, the main dictionary, and three variables, my_keys, my_items, and my_values, all created using the dictionary API. Next, we purposefully invoke behavior that isn't supported in Python 3. We do this by acting as if the values returned by the dictionary API are list values when they are in fact generators. This produces a TypeError each time we try to do it because the generators stored in my_keys, my_items, and my_values do not support indexing.

Finally, we simply iterate over each generator containing data derived from my_dict. This works just as expected and is in fact the main use of the data returned by the dictionary methods shown here. Sure, the direct indexing doesn't work on the returned generators, but is that really a common use of this data? I would certainly think not. The key aspect of this API change is that the API now returns a structure that is good at iterative functionality and that happens to be the intended use. And, if indexing these values that are returned from the dictionary API are absolutely necessary, these generators can easily be turned into lists. It is just an extra step involved for the rare use of the data.