I was recently watching this talk by Raymond Hettinger, which I recommend watching, he's a pretty entertaining presenter. Raymond mentions one of his (many) contributions to Python, specifically the namedtuple method from the collections module. In this post I'm going to breakdown exactly how the namedtuple works.

What is a Named Tuple?

If you know what a named tuple is, and you've used them in the past, feel free to skip this part.

Let's say I have a tuple containing 3 floating point numbers,

# Our tuple
point = (100.14, 245.86, 132.17)

Now lets say, we want to use these values to perform some operations. We could take the values out of this tuple in a couple ways:

# Way 1 - Reference by Index

point = (100.14, -245.86, 132.17)

if 0 < point[0] < 100:
    print("You're very close")

else:
    print("You could be closer")

if -100 < point[1] < 100 and point[1] is not 0:
    print("You're off balance")

elif point[1] is 0:
    print("Perfect")

else:
    print("you could do better")

if -100 < point[2] < 100:
    print("Close enough is good enough")
else:
    print("Maybe one day...")

This code works, it runs, it does what it is meant to do (that is, output a bunch of demotivational sentence fragments, but hey, made up examples are surprisingly hard.). But is it easy to read? Is it intuitive? Self-documenting? I would say not. Without knowing exactly what the tuple looks like, it's hard to tell for sure what we're dealing with. There is another way.

# Way 2 - Tuple Unpacking

point = (100.14, -245.86, 132.17)

x, y, z = point

if 0 < x < 100:
    print("You're very close: %.2f" % x)
else:
    print("You could be closer: %.2f" % x)


if -100 < y < 100 and y is not 0:
    print("You're off balance: %.2f" % y)
elif y is 0:
    print("Perfect")
else:
    print("you could do better: %.2f" % y)


if -100 < z < 100:
    print("Close enough is good enough: %.2f" % z)
else:
    print("Maybe one day... %.2f" % z)

Now this is a bit better. It's simple and it's readable. I can tell exactly what x, y, and z are, I can see where they come from. Of course if I had two or more points, things would start to get a little bit more complex - I've seen variable names like point1_x, point2_x pop up in the past. Enter the named tuple.

# The Nice Way

from collections import namedtuple

Point = namedtuple('Point', ['x', 'y', 'z'])

point = Point(100.14, -245.86, 132.17)

if 0 < point.x < 100:
    print("You're very close: %.2f" % point.x)
else:
    print("You could be closer: %.2f" % point.x)


if -100 < point.y < 100 and point.y is not 0:
    print("You're off balance: %.2f" % point.y)
elif point.y is 0:
    print("Perfect")
else:
    print("you could do better: %.2f" % point.y)


if -100 < point.z < 100:
    print("Close enough is good enough: %.2f" % point.z)
else:
    print("Maybe one day... %.2f" % point.z)

Now this is a better way. We're defining what our tuple is, then populating it. We're avoiding possible variable name confusion by accessing members rather than regular variables. It's just nicer. It reads better. It is more maintainable. This is exactly what a namedtuple is for; its entire purpose in life is to make our lives easier in the future. Now, on to how the magic happens.

The Magic of the NamedTuple

A namedtuple is a pretty straight forward concept. Simply put, it is concerned with mapping a positional, indexed value in a tuple to a class property. In order to do this, it uses a form of witchcraft most programers wisely avoid. It uses an exec statement to dynamically create a brand new class. You can see the class it creates by use the verbose argument to the namedtuple function:

>> from collections import namedtuple
>> nt = namedtuple('Point', ['x', 'y', 'z'], verbose=True)

Which will output this niftly class definition:

from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class Point(tuple):
    'Point(x, y, z)'

    __slots__ = ()

    _fields = ('x', 'y', 'z')

    def __new__(_cls, x, y, z):
        'Create new instance of Point(x, y, z)'
        return _tuple.__new__(_cls, (x, y, z))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new Point object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != 3:
            raise TypeError('Expected 3 arguments, got %d' % len(result))
        return result

    def _replace(_self, **kwds):
        'Return a new Point object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, ('x', 'y', 'z'), _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % list(kwds))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + '(x=%r, y=%r, z=%r)' % self

    @property
    def __dict__(self):
        'A new OrderedDict mapping field names to their values'
        return OrderedDict(zip(self._fields, self))

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values.'
        return self.__dict__

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    def __getstate__(self):
        'Exclude the OrderedDict from pickling'
        return None

    x = _property(_itemgetter(0), doc='Alias for field number 0')

    y = _property(_itemgetter(1), doc='Alias for field number 1')

    z = _property(_itemgetter(2), doc='Alias for field number 2')

Looking at that class definition we can see that there's a lot of effort put into making sure the created class is documented. We've got a nice __repr__. We've got nice doc strings attached to our x, y, and z properties. We've even got a doc string on the class. Aside from a few strange naming conventions, e.g. imports prefixed with _, this class might just fit right into our application. It certainly looks like a normal class, if we call help on it we can see that. From the examples above, we can see that the entire point of the namedtuple function is to make our code more readable and documented. So, it makes absolute sense that the namedtuple takes care to generate nice documentation.

There is some strangeness going on with the constructor. You'll notice there is no __init__ method, instead there is a __new__ method. The __new__ method has to be used when we're inheriting from any kind of immutable class. This is due to the design of the python class system and how immutable classes are implemented. The __new__ method is a class method, but it is usually undecorated as it is a special case handled by the class constructor. For a normal class you would set the values for the instance in the initialisation stage (i.e. by overloading the __init__ method). However, for immutable classes you instead must return an constructed instance with the intended values from the parent class. Thats the short version, to learn more about the uses of the __new__ method checkout the original docs.

That's all well and good, but what does the source look like? How does this class come into existence?

We'll start with the function call itself. The namedtuple function needs just two arguments: a name for our to be created class and the ordered field names for our source tuple.

def namedtuple(typename, field_names, verbose=False, rename=False):

The verbose and rename keyword arguments are rarely seen in the wild. The rename argument allows some of the field name validation to be relaxed, and allows having the same name for two different positions. The verbose argument simply tells the constructor to print out the created class definition.

The next most important part of the code is a private module variable, _class_template which forms the base of all namedtuples.

_class_template = """\
from builtins import property as _property, tuple as _tuple
from operator import itemgetter as _itemgetter
from collections import OrderedDict

class {typename}(tuple):
    '{typename}({arg_list})'

    __slots__ = ()

    _fields = {field_names!r}

    def __new__(_cls, {arg_list}):
        'Create new instance of {typename}({arg_list})'
        return _tuple.__new__(_cls, ({arg_list}))

    @classmethod
    def _make(cls, iterable, new=tuple.__new__, len=len):
        'Make a new {typename} object from a sequence or iterable'
        result = new(cls, iterable)
        if len(result) != {num_fields:d}:
            raise TypeError('Expected {num_fields:d} arguments, got %d' % len(result))
        return result

    def _replace(_self, **kwds):
        'Return a new {typename} object replacing specified fields with new values'
        result = _self._make(map(kwds.pop, {field_names!r}, _self))
        if kwds:
            raise ValueError('Got unexpected field names: %r' % list(kwds))
        return result

    def __repr__(self):
        'Return a nicely formatted representation string'
        return self.__class__.__name__ + '({repr_fmt})' % self

    @property
    def __dict__(self):
        'A new OrderedDict mapping field names to their values'
        return OrderedDict(zip(self._fields, self))

    def _asdict(self):
        'Return a new OrderedDict which maps field names to their values.'
        return self.__dict__

    def __getnewargs__(self):
        'Return self as a plain tuple.  Used by copy and pickle.'
        return tuple(self)

    def __getstate__(self):
        'Exclude the OrderedDict from pickling'
        return None

{field_defs}
"""

The _class_template is just a string. There are no inheritance tricks. No special magic. Just a template. The interesting stuff comes in when we take a look at the data we use to fill in our class.

# Fill-in the class template
class_definition = _class_template.format(
    typename = typename,
    field_names = tuple(field_names),
    num_fields = len(field_names),
    arg_list = repr(tuple(field_names)).replace("'", "")[1:-1],
    repr_fmt = ', '.join(_repr_template.format(name=name)
                         for name in field_names),
    field_defs = '\n'.join(_field_template.format(index=index, name=name)
                           for index, name in enumerate(field_names))
)

Here, we're formatting our field definitions, field names, the constructor, arguments and building up the class representation. These are then used to populate our template and, out comes something that looks like a valid python class, the kind of class you'd be likely to find in a well named, organised .py file. Only, we don't have a file, all we've got is an arbitrary piece of string in our class_definition variable.

Enter the Python exec builtin. The exec method is a lot like your classic eval statement, however, it's slightly different from a raw eval in that it allows you to optionally setup the namespace that the code executes in, it allows for running more than 1 statement at a time, and instead of returning the results of a single statement it applies them to the given namespace or by default the current namespace. The usage in the namedtuple function is the following:

namespace = dict(__name__='namedtuple_%s' % typename)
exec(class_definition, namespace)
result = namespace[typename]
result._source = class_definition

So, for creating our custom class, we first create our new namespace. A namespace just needs to be a dictionary, we could provide an just an empty dictionary, but according to the comments many tracing utilities use the __name__ attribute to determine what module they're currently in. Now we've got a namespace, we simply exec our class. Because it executes in the given namespace, or in the current namespace, we have to pull out the created class instance from our temporary namespace. You can play with this behaviour by firing up the Python interpreter and trying out the snippet below:

>>> namespace = {}
>>> exec("x = ('a', 'b')", namespace)
>>> print namespace['x']
('a', 'b')

There we have it, named tuples broken down into byte sized pieces. Named tuples are an incredibly useful and handy piece of code. They're self documenting and they're 100% compatible with ordinary tuples. As a sidenote, hopefully I've shown that using things that are often thought of as unsafe, as dangerous, and downright wrong, sometimes isn't so bad. As long as care is taken to use them safely, the eval and exec family of methods are particularly useful when there is a well defined scope of possible inputs and a known format to the output.

The Complete Code:

The complete code for the collections module is available on the cpython github mirror. The whole collections package is worth take a look through it's pretty wonderful. The namedtuple source is here.