• programming in python • a course for the curious •

In this post I we will explain the algorithm which is implemented in
`heapq` module. This is a very useful code if you want to have a list
which has to be sorted, even if you add or pop out objects from it. Let us
brifly ilustrate how it works. For this we will use the following function
which prints a list in a tree form:

```
import math
from cStringIO import StringIO
def show_tree(tree, total_width=50, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = -1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i+1, 2)))
else:
row = 0
if row != last_row:
output.write('\n')
columns = 2**row
col_width = int(math.floor((total_width * 1.0) / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print output.getvalue()
print '-' * total_width
print
return
```

I took this function from heapq
description at *Module of the week* by Doug Hellmann. Let us ilustrate how to
to use `heapq` module and how it works. We start with the `show_tree`
function:

```
>>>show_tree(range(1,16))
1
2 3
4 5 6 7
8 9 10 11 12 13 14 15
--------------------------------------------------
```

Let us now see how the `heapq` module works. It has a function
`heapq.heapify()` which takes a list as an input and returns it but the
elements are now in the heap order.

```
>>> import heapq
>>> h=[6,5,4,3,2,1,0]
>>> heapq.heapify(h)
>>> h
[0, 2, 1, 3, 5, 6, 4]
```

The first element is the smallest element of the list. To see how the order looks like let us draw the tree representation:

```
>>> show_tree(h)
0
2 1
3 5 6 4
------------------------------------
```

The clue is that every parent is smaller than its childrens. When you pop the
smallest element with the `heapq.heappop()` function it will also keep the
order of the list. The most important thing is that the sorting algorithm is
fast, and does not require to sort whole list as the function `sort()`
would do. So let’s try:

```
>>> heapq.heappop(h)
0
>>> show_tree(h)
1
2 4
3 5 6
------------------------------------
```

Now at the tree root there is `1` and still every parent is smaller than its
children. How the tree was changed: after `0` was poped out the two
elements in next row will want to jump to the root, the smaller one wins so
`1` lands in the root, now the place of `1` is free and `6` and `4`
play for its place, since `4` is smaller it takes the place of `1`.

You can also insert elements into the heap data structure. The function
`heapq.heappush()` will push an element onto the heap and sort it so that
the heap order will be preserved:

```
>>> heapq.peahpush(1)
>>> show_tree(h)
1
2 1
3 5 6 4
------------------------------------
```

The algorithm adds the element to the end of the list and then promotes it
untill it finds its place. There is also `heapq.heappushpop()` function
which is a fast version of `heapq.heappush()` followed by
`heapq.heappop()`, and `heapq.heapreplace()` which pops and returns
the current smallest value and then adds a new item to the heap (it is also
more efficient that `heapq.heappop()` followed by `heapq.heappush()`).
Other functions that are included in the module:

heapq.nlargest()- find the n-th largest element in an iterable object,heapq.nsmallest()- find the n-th smallest element in an iterable object,heapq.merge()- merge multiple sorted iterables into one sorted iterable object: it returns a generator, does not pull all the data into memory at once, and assumes that all the input interators are already sorted (an expert from its__doc__string).

In this post we will explain how `heapq.heappop()`, `heappush()` and
`heapq.heapify()` are implemented. We begin with some observations on
binary trees and binary arithmetics.

Let us come back to the binary tree.

As you will see in a moment binary trees have something to do with binary
operations. First of all let us observe that at level 0 there is `2**0`
elements, at level 1 – `2**1`, at level n there are `2**n` elements. All
these numbers have a nice representation in binary form: `0b1`, `0b10`,
`0b100`, and so on ... . Let as draw the binary tree once again but this
time let us label nodes with integers in the binary representation:

Now you can see that the binary shift operation (or integer division by 2)
finds a parent of a node. For example the parent of `12=0b1100` is
`6=0b110` also it is a parent of `13=0b1101`. To find a child of a node
it is enough to shift in the other direction (multiply by 2) or shift and add
`1`. Since
*Python* uses 0-base indexing this two operations will look a little bit
different when we use it to index nodes of a tree. The parent of node with
index `n` will be `(n-1)>>1`: when we indexed the nodes starting from
`1` it was just `n>>1` hence now it is `(n+1)>>1-1` and this is equal
to `(n-1)>>1`: `(n+1)>>1-1=(n+1)//2-1=(n-1)//2=(n-1)>>1`. Child nodes of
the node `n` are now: `2*(n+1)-1=2*n+1` (the left one) and
`2*(n+1)-1+1=2*n+2` for the right one.

There are two basic function which are used to implement the heap
`heapq._siftdown()` and `heapq._siftup()`. These function are the two
basic blocks to implement all the other functions in the `heapq` module.

```
def _siftdown(heap, startpos, pos):
newitem = heap[pos]
# Follow the path to the root, moving parents down until finding a place
# newitem fits.
while pos > startpos:
parentpos = (pos - 1) >> 1
parent = heap[parentpos]
if cmp_lt(newitem, parent):
heap[pos] = parent
pos = parentpos
continue
break
heap[pos] = newitem
```

The first argument of `heapq._siftdown()`: `heap` is a list. In first
step it takes the element `heap[pos]` (which is equal to `newitem`) finds
its parent (`parentpos`). Then the function :func:`heapq.cmp_lt` is used
to compare the value at ``pos` with the value at its parent node
`parentpos`. It returns `True` if the value at `pos` (which is
`newitem`) is smaller than its parent.

```
def cmp_lt(x, y):
# Use __lt__ if available; otherwise, try __le__.
# In Py3.x, only __lt__ will be called.
return (x < y) if hasattr(x, '__lt__') else (not y <= x)
```

If the value at `pos` is smaller than its parent the parent is moved down
(this is why the function is called `_siftdown`). Now we move to the parent
position and start over again: we check if the `newitem` is smaller than the
parent if it is the parent is moved down. The process stops either if the
parent is smaller than `newitem` in which case we put the `newitem` as its
child. The `while` loop guards that we will not go beyond the `startpos`,
which actually might be the ending position of the algorithm.

Let us ilustrate this with an example:

```
>>> h = [6, 2, 4, 3, 5, 1]
>>> show_tree(h)
6
2 4
3 5 1
------------------------------------
>>> heapq._siftdown(h,0,len(h)-1)
1
2 4
3 5 6
------------------------------------
```

Here the `while` loop will act twice, in slow motion we get:

```
>>> h = [6, 2, 4, 3, 5, 1]
>>> heapq._siftdown(h,2,len(h)-1)
>>> show_tree(h)
6
2 1
3 5 4
------------------------------------
```

Since `1` was smaller than its parent `4` the parent is moved down. Since
now `1` (which is our `newitem` all the time) is smaller than its parent
it will be moved up (or its parent will be shifted down):

```
>>> heapq._siftdown(h,0,2)
>>> show_tree(h)
1
2 4
3 5 6
------------------------------------
```

and we get the final result.

```
def _siftup(heap, pos):
endpos = len(heap)
startpos = pos
newitem = heap[pos]
# Bubble up the smaller child until hitting a leaf.
childpos = 2*pos + 1 # leftmost child position
while childpos < endpos:
# Set childpos to index of smaller child.
rightpos = childpos + 1
if rightpos < endpos and not cmp_lt(heap[childpos], heap[rightpos]):
childpos = rightpos
# Move the smaller child up.
heap[pos] = heap[childpos]
pos = childpos
childpos = 2*pos + 1
# The leaf at pos is empty now. Put newitem there, and bubble it up
# to its final resting place (by sifting its parents down).
heap[pos] = newitem
_siftdown(heap, startpos, pos)
```

At the very begining of this function two objects are rememberd: `startpos`
which is the position at which the algorithm will start and `newitem` the
value at the starting position. The will be used at the very end. This
function goes down to the children. The `childpos` variable is the position
of a child node. Then we enter the `while` loop in which first we check if
we keep the position of the smaller of the two children. Then we promote the
smaller children to the parent pos and we start over again but not at the
child position `childpos` (which we moved up). So this implements the
winnning algorithm that we have noted *before* (smaller wins
and it move up the tree, then we go to the free spot and look which children
should win this place and so on ...). After the `while` loop completes,
which is when we hit a leaf of the tree, we take the `newitem` and add it at
as a leaf. Now we use the `heapq._siftdown()` to place it at the right
spot below the `startpos`.

To ilustrate how it works let us take:

```
>>> h=[3,2,1]
>>> show_tree(h)
3
2 1
------------------------------------
>>> heapq._siftup(h,0)
>>> show_tree(h)
1
2 3
------------------------------------
```

So it sorted the heap, and if the list is sorted the order will be preserved:

```
>>> heapq._siftup(h,0)
>>> show_tree(h)
1
2 3
------------------------------------
```

```
def heapify(x):
"""Transform list into a heap, in-place, in O(len(x)) time."""
n = len(x)
for i in reversed(xrange(n//2)):
_siftup(x, i)
```

The implementation of `heapq.heapify()` which sorts a list with respect to
the heap order looks quite simple. It just runs the `heapq._siftup()`
function for every inner node (a node which is not a leaf) starting from the
last node up to the 0-th node. This will first sort every triangle with two
(or one) leaves and its parent, then move one level up and sort the parent and
grandparents of leaves, and so on. At the end we will get a list sorted by
the heap order. For example in the following tree:

The `for` loop will only go through the nodes: `v7` to `v1`. First it
will sort the node `v7` and its children, and `v6` and its children, ... .
When it will come to `v3` its children will be already sorted now it will
take the value `v3` out and the smaller of `v6` and `v7` will be
promoted, the `v3` will be added as a leaf and then moved to the right
position. This is how the `heapq._siftup()` function works. And it will
assure that the the sub tree `v3` and its children will be sorted. In the
same way `v2` and its children will be sorted. To prove that any list will
be sorted in this way it is enough to consider a tree `t` of the form:

where `t1` and `t2` are already sorted tries. It should be clear (well,
maybe after a minute or two) that when we run `heapq._siftup(t,0)` the whole
tree `t` will get sorted.

Now let us look at the `heapq.heappush()` function. Well it is even
simpler:

```
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap)-1)
```

Indeed it is enough to append the elemnt as the last leaf and then run
`heapq._siftdown()`. It will promote the appended element to the right
place: just under the first its parent which holds a smaller element.

```
def heappop(heap):
"""Pop the smallest item off the heap, maintaining the heap invariant."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
_siftup(heap, 0)
else:
returnitem = lastelt
return returnitem
```

As you see there are two cases: the `heap` has is empty after poping out an
element, in which case there is nothing to do. If the `heap` is still not
empty, we put the last element `lastelt` (from the very last leaf of the
tree), and then we put it in the root of the tree and we run
`heapq._siftup()`. Since both trees under the left and right root element
are sorted, the tree will be sorted after just one application of
`heapq._siftup()`. Finally the function returns the `returnitem` which
is the smallest element of the `heap` that came as an input.

In all the examples we were using integers as values of heap lists, but actually you can use any objects which have comparisions methods defined.

You could think that just by changing the function `heapq.cmp_lt()` you
could reverse the order in a heap. This is not the case, since actually, if
the module `_heapq` is present it is imported and the functions
`heapq.heapify()`, `heapq.heappop()`, `heapq.heappush()`,
`heapq.heappushpop()`, `heapq.heapreplace()`, `heapq.nlargest()`
and `heapq.nsmallest()` are overwritten. The `_heapq` module is
written in *C* – hance it is faster. It implements the same algorithm as the
one described above. If you want to reverse the order of a heap you can
redefine the comparision methods of your objects.

we are not associated with Python Software
Foundation

© Copyright 2012, 2013 Accorda Institute.