• programming in python • a course for the curious •

heapq module

In this post I we will explain the algorithm which is implemented in heapq module. This is a very useful code if you want to have a list which has to be sorted, even if you add or pop out objects from it. Let us brifly ilustrate how it works. For this we will use the following function which prints a list in a tree form:

import math
from cStringIO import StringIO

def show_tree(tree, total_width=50, fill=' '):
    """Pretty-print a tree."""
    output = StringIO()
    last_row = -1
    for i, n in enumerate(tree):
        if i:
            row = int(math.floor(math.log(i+1, 2)))
        else:
            row = 0
        if row != last_row:
            output.write('\n')
        columns = 2**row
        col_width = int(math.floor((total_width * 1.0) / columns))
        output.write(str(n).center(col_width, fill))
        last_row = row
    print output.getvalue()
    print '-' * total_width
    print
    return

I took this function from heapq description at Module of the week by Doug Hellmann. Let us ilustrate how to to use heapq module and how it works. We start with the show_tree function:

>>>show_tree(range(1,16))

                        1
            2                        3
    4           5           6           7
8     9     10     11    12    13    14    15
--------------------------------------------------

Let us now see how the heapq module works. It has a function heapq.heapify() which takes a list as an input and returns it but the elements are now in the heap order.

>>> import heapq
>>> h=[6,5,4,3,2,1,0]
>>> heapq.heapify(h)
>>> h
[0, 2, 1, 3, 5, 6, 4]

The first element is the smallest element of the list. To see how the order looks like let us draw the tree representation:

>>> show_tree(h)

                0
        2                 1
    3        5        6        4
------------------------------------

The clue is that every parent is smaller than its childrens. When you pop the smallest element with the heapq.heappop() function it will also keep the order of the list. The most important thing is that the sorting algorithm is fast, and does not require to sort whole list as the function sort() would do. So let’s try:

>>> heapq.heappop(h)
0
>>> show_tree(h)

                1
        2                 4
    3        5        6
------------------------------------

Now at the tree root there is 1 and still every parent is smaller than its children. How the tree was changed: after 0 was poped out the two elements in next row will want to jump to the root, the smaller one wins so 1 lands in the root, now the place of 1 is free and 6 and 4 play for its place, since 4 is smaller it takes the place of 1.

You can also insert elements into the heap data structure. The function heapq.heappush() will push an element onto the heap and sort it so that the heap order will be preserved:

>>> heapq.peahpush(1)
>>> show_tree(h)

                1
        2                 1
    3        5        6        4
------------------------------------

The algorithm adds the element to the end of the list and then promotes it untill it finds its place. There is also heapq.heappushpop() function which is a fast version of heapq.heappush() followed by heapq.heappop(), and heapq.heapreplace() which pops and returns the current smallest value and then adds a new item to the heap (it is also more efficient that heapq.heappop() followed by heapq.heappush()). Other functions that are included in the module:

  1. heapq.nlargest() - find the n-th largest element in an iterable object,
  2. heapq.nsmallest() - find the n-th smallest element in an iterable object,
  3. heapq.merge() - merge multiple sorted iterables into one sorted iterable object: it returns a generator, does not pull all the data into memory at once, and assumes that all the input interators are already sorted (an expert from its __doc__ string).

In this post we will explain how heapq.heappop(), heappush() and heapq.heapify() are implemented. We begin with some observations on binary trees and binary arithmetics.

Binary trees

Let us come back to the binary tree.

[level distance=1.2cm,
level 1/.style={sibling distance=6.5cm},
level 2/.style={sibling distance=3.25cm},
level 3/.style={sibling distance=1.5cm}
]
\node {\texttt{\small 1}}
    child { node{\texttt{\small 2}}
        child{ node{\texttt{\small 4}}
            child {node{\texttt{\small 8}}}
            child {node{\texttt{\small 9}}}
        }
        child{ node{\texttt{\small 5}}
            child {node{\texttt{\small 10}}}
            child {node{\texttt{\small 11}}}
        }
    }
    child { node{\texttt{\small 3}}
            child{ node{\texttt{\small 6}}
                child {node{\texttt{\small 12}}}
                child {node{\texttt{\small 13}}}
            }
            child{ node{\texttt{\small 7}}
                child {node{\texttt{\small 14}}}
                child {node{\texttt{\small 15}}}
            }
    };

As you will see in a moment binary trees have something to do with binary operations. First of all let us observe that at level 0 there is 2**0 elements, at level 1 – 2**1, at level n there are 2**n elements. All these numbers have a nice representation in binary form: 0b1, 0b10, 0b100, and so on ... . Let as draw the binary tree once again but this time let us label nodes with integers in the binary representation:

[level distance=1.2cm,
level 1/.style={sibling distance=6.5cm},
level 2/.style={sibling distance=3.25cm},
level 3/.style={sibling distance=1.5cm}
]
\node {\texttt{\small 0b1}}
    child { node{\texttt{\small 0b10}}
        child{ node{\texttt{\small 0b100}}
            child {node{\texttt{\small 0b1000}}}
            child {node{\texttt{\small 0b1001}}}
        }
        child{ node{\texttt{\small 0b101}}
            child {node{\texttt{\small 0b1010}}}
            child {node{\texttt{\small 0b1011}}}
        }
    }
    child { node{\texttt{\small 0b11}}
            child{ node{\texttt{\small 0b110}}
                child {node{\texttt{\small 0b1100}}}
                child {node{\texttt{\small 0b1101}}}
            }
            child{ node{\texttt{\small 0b111}}
                child {node{\texttt{\small 0b1110}}}
                child {node{\texttt{\small 0b1111}}}
            }
    };

Now you can see that the binary shift operation (or integer division by 2) finds a parent of a node. For example the parent of 12=0b1100 is 6=0b110 also it is a parent of 13=0b1101. To find a child of a node it is enough to shift in the other direction (multiply by 2) or shift and add 1. Since Python uses 0-base indexing this two operations will look a little bit different when we use it to index nodes of a tree. The parent of node with index n will be (n-1)>>1: when we indexed the nodes starting from 1 it was just n>>1 hence now it is (n+1)>>1-1 and this is equal to (n-1)>>1: (n+1)>>1-1=(n+1)//2-1=(n-1)//2=(n-1)>>1. Child nodes of the node n are now: 2*(n+1)-1=2*n+1 (the left one) and 2*(n+1)-1+1=2*n+2 for the right one.

There are two basic function which are used to implement the heap heapq._siftdown() and heapq._siftup(). These function are the two basic blocks to implement all the other functions in the heapq module.

heapq._siftdown()

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if cmp_lt(newitem, parent):
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

The first argument of heapq._siftdown(): heap is a list. In first step it takes the element heap[pos] (which is equal to newitem) finds its parent (parentpos). Then the function :func:heapq.cmp_lt` is used to compare the value at ``pos with the value at its parent node parentpos. It returns True if the value at pos (which is newitem) is smaller than its parent.

def cmp_lt(x, y):
    # Use __lt__ if available; otherwise, try __le__.
    # In Py3.x, only __lt__ will be called.
    return (x < y) if hasattr(x, '__lt__') else (not y <= x)

If the value at pos is smaller than its parent the parent is moved down (this is why the function is called _siftdown). Now we move to the parent position and start over again: we check if the newitem is smaller than the parent if it is the parent is moved down. The process stops either if the parent is smaller than newitem in which case we put the newitem as its child. The while loop guards that we will not go beyond the startpos, which actually might be the ending position of the algorithm.

Let us ilustrate this with an example:

>>> h = [6, 2, 4, 3, 5, 1]
>>> show_tree(h)

                6
        2                 4
    3        5        1
------------------------------------
>>> heapq._siftdown(h,0,len(h)-1)

                1
        2                 4
    3        5        6
------------------------------------

Here the while loop will act twice, in slow motion we get:

>>> h = [6, 2, 4, 3, 5, 1]
>>> heapq._siftdown(h,2,len(h)-1)
>>> show_tree(h)

                6
        2                 1
    3        5        4
------------------------------------

Since 1 was smaller than its parent 4 the parent is moved down. Since now 1 (which is our newitem all the time) is smaller than its parent it will be moved up (or its parent will be shifted down):

>>> heapq._siftdown(h,0,2)
>>> show_tree(h)

                1
        2                 4
    3        5        6
------------------------------------

and we get the final result.

heapq._siftup()

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # leftmost child position
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1
        if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]):
            childpos = rightpos
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

At the very begining of this function two objects are rememberd: startpos which is the position at which the algorithm will start and newitem the value at the starting position. The will be used at the very end. This function goes down to the children. The childpos variable is the position of a child node. Then we enter the while loop in which first we check if we keep the position of the smaller of the two children. Then we promote the smaller children to the parent pos and we start over again but not at the child position childpos (which we moved up). So this implements the winnning algorithm that we have noted before (smaller wins and it move up the tree, then we go to the free spot and look which children should win this place and so on ...). After the while loop completes, which is when we hit a leaf of the tree, we take the newitem and add it at as a leaf. Now we use the heapq._siftdown() to place it at the right spot below the startpos.

To ilustrate how it works let us take:

>>> h=[3,2,1]
>>> show_tree(h)

                3
        2                 1
------------------------------------

>>> heapq._siftup(h,0)
>>> show_tree(h)

                1
        2                 3
------------------------------------

So it sorted the heap, and if the list is sorted the order will be preserved:

>>> heapq._siftup(h,0)
>>> show_tree(h)

                1
        2                 3
------------------------------------

heapq.heapify()

def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    for i in reversed(xrange(n//2)):
        _siftup(x, i)

The implementation of heapq.heapify() which sorts a list with respect to the heap order looks quite simple. It just runs the heapq._siftup() function for every inner node (a node which is not a leaf) starting from the last node up to the 0-th node. This will first sort every triangle with two (or one) leaves and its parent, then move one level up and sort the parent and grandparents of leaves, and so on. At the end we will get a list sorted by the heap order. For example in the following tree:

[level distance=1.2cm,
level 1/.style={sibling distance=6.5cm},
level 2/.style={sibling distance=3.25cm},
level 3/.style={sibling distance=1.5cm}
]
\node {\texttt{\small v1}}
    child { node{\texttt{\small v2}}
        child{ node{\texttt{\small v4}}
            child {node{\texttt{\small v8}}}
            child {node{\texttt{\small v9}}}
        }
        child{ node{\texttt{\small v5}}
            child {node{\texttt{\small v10}}}
            child {node{\texttt{\small v11}}}
        }
    }
    child { node{\texttt{\small v3}}
            child{ node{\texttt{\small v6}}
                child {node{\texttt{\small v12}}}
                child {node{\texttt{\small v13}}}
            }
            child{ node{\texttt{\small v7}}
                child {node{\texttt{\small v14}}}
                child {node{\texttt{\small v15}}}
            }
    };

The for loop will only go through the nodes: v7 to v1. First it will sort the node v7 and its children, and v6 and its children, ... . When it will come to v3 its children will be already sorted now it will take the value v3 out and the smaller of v6 and v7 will be promoted, the v3 will be added as a leaf and then moved to the right position. This is how the heapq._siftup() function works. And it will assure that the the sub tree v3 and its children will be sorted. In the same way v2 and its children will be sorted. To prove that any list will be sorted in this way it is enough to consider a tree t of the form:

[level distance=1.2cm,
level 1/.style={sibling distance=2cm},
]
\node {\texttt{\small v1}}
    child{ node{\texttt{\small t1}}}
    child{ node{\texttt{\small t2}}};

where t1 and t2 are already sorted tries. It should be clear (well, maybe after a minute or two) that when we run heapq._siftup(t,0) the whole tree t will get sorted.

heapq.heappush()

Now let us look at the heapq.heappush() function. Well it is even simpler:

def heappush(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown(heap, 0, len(heap)-1)

Indeed it is enough to append the elemnt as the last leaf and then run heapq._siftdown(). It will promote the appended element to the right place: just under the first its parent which holds a smaller element.

heapq.heappop()

def heappop(heap):
    """Pop the smallest item off the heap, maintaining the heap invariant."""
    lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
    else:
        returnitem = lastelt
    return returnitem

As you see there are two cases: the heap has is empty after poping out an element, in which case there is nothing to do. If the heap is still not empty, we put the last element lastelt (from the very last leaf of the tree), and then we put it in the root of the tree and we run heapq._siftup(). Since both trees under the left and right root element are sorted, the tree will be sorted after just one application of heapq._siftup(). Finally the function returns the returnitem which is the smallest element of the heap that came as an input.

Final Comments

In all the examples we were using integers as values of heap lists, but actually you can use any objects which have comparisions methods defined.

You could think that just by changing the function heapq.cmp_lt() you could reverse the order in a heap. This is not the case, since actually, if the module _heapq is present it is imported and the functions heapq.heapify(), heapq.heappop(), heapq.heappush(), heapq.heappushpop(), heapq.heapreplace(), heapq.nlargest() and heapq.nsmallest() are overwritten. The _heapq module is written in C – hance it is faster. It implements the same algorithm as the one described above. If you want to reverse the order of a heap you can redefine the comparision methods of your objects.

Last update: 2013-11-06 11:02 (CET)
Contact us.
Created using , and
we are not associated with Python Software Foundation
© Copyright 2012, 2013 Accorda Institute.