• programming in python • a course for the curious •

Basic Python Data Types

Number types: int, long, float, complex

Python is a scripting language That means you do not need to compile your code to get an executable binary file like in C, C++, Pascal or many other languages. Other examples of scripting languages are Perl or Ruby.

Python comes with an interactive shell which settles an environment in which we can craft simple commands and check snippets of code. On Linux or MacOs you need to open a terminal emulator (TerminalApp on MacOs, or Konsole in KDE environment or GnomeTerminal in Gnome.) on Windows run the IDLE from Python section of your Start Menu. Also MacOs users can run the IDLE which makes it a little bit easier to go through this tutorial. When you run Python from a terminal emulator python executable have to be in your PATH variable. If it is you just have to type: python and you should get something like:

Python 2.7.3 (default, Jun 12 2012, 11:27:33)
[GCC 4.5.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

Python shell has exactly the same syntax as Python scripts. Now we can write our first python expressions.

>>> print("Hello")
>>> 2*2
>>> a=1
>>> b=2
>>> b-a


In Python you don’t need to declare the variable’s type. Names of variables are just labels. You might do

>>> a=1
>>> a="string"

and you will not get an error due to type mismatch (as in declarative languages). Now the a is a label for the object "string".

The interactive shell can execute Python statements like print, evaluate Python expressions including any arithmetic ones, assigns values to variables (or to be more precise assign labels to values).

There are several types of numbers you can use: integers int, long integers long (only in Python2), float numbers float and complex numbers complex:

>>> i=2
>>> type(i)
<type 'int'>
>>> L=long(100000000) # only in python2.7
>>> type(L)
<type 'long'>
>>> f=3.4
>>> type(f)
<type 'float'>
>>> c=(2+1j)
>>> type(c)
<type 'complex'>


Python3 does not have the long class. It will automatically alocate enought memory. This makes Python3 much nicer when you are writting code which operates on large integers. This is done in the way that makes it effective (does not use to much memory for small integers), see PEP 237 for more information.

The type() built-in function returns the type of its argument. There is a slogan that everything in Python is an object. We will return later in a greater detail what is an object, for now let us understand it as an abstract data together with methods that can change its state (or state of other objects - like functions). Every object has a type. Even the type() function returns an object of some type, for example:

>>> complex_type=type((1+1j))
>>> type(complex_type)
<type 'type'>
>>> c=2+1j
>>> type(c)
<type 'complex'>

You can check easily that there is a unique number object with a given value. The id() built-in function returns the unique id number of an object (this is implemented as a memory register in CPython, i.e. the C implementation of Python):

>>> id(1200.0) == id(1.2e3)
>>> id(1200) == id(1.2e3)
>>> a=1.2e3
>>> a
>>> type(a)
>>> b=1200.0
>>> a is b
>>> id(a) == id(b)

The statement in line 3 returns False since the first object is of type int while the second is a float. The is statement in line 11 is equivalent to comparing the values returned by id().

Here is a short list of mathematical operations that you can use. You can mix number types as python will make the job for you and change integers into floats or complex numbers if it is need. Not all of them are defined on complex numbers.

+ sum
- subtraction
* multiplication
** exponentiation
/ division
// integer division
% remainder
>>> 9//4
>>> 9%4

In Python3 the division / always returns a float, while in Python2 it will work like integer division if both numbers are of the integer type. So it is good to use // whenever you want integer devision. In this way your code will be both Python2 and Python3 compatible.

>>> 8/4 # in python3
>>> 8/4 # in python2 it will return an integer

You can also specify integers in binary form. Using the syntax: 0b1, 0b10, 0b11 (which have values 1,2 and 3 respectively):

>>> 0b1 is 1

And there are bitwise operators as well.

| bitwise or
^ bitwise xor
& bitwise and
>>, << shifts

Let us illustrate the similarities and differences between bitwise operations and the arithmetic operations:

>>> bin(0b10|0b1)
>>> 0b10+0b1
>>> bin(0b10|0b10)
>>> 0b10+0b10
>>> bin(0b1010|0b1100)
>>> bin(0b1010+0b1100)
>>> bin(0b1010&0b1100)

As you can see the operations & and | are done bit by bit, hence the name bitwise. They are often met in python modules. Let us consider that we have a function that depends on a set of arguments, which have only two values: set or unset. For simplicity, let say that we have two such options A and B. Then you could encode the value of A as the first bit and the value of B as the second one. Passing A, B, A|B would correspond to only A is set, only B is set, both A and B are set. And the null value 0 corresponds to the last possibility: neither A nor B is set. You wonder how to get the value of n-th bit from an integer? Here comes at hand the shift operator >>:

>>> a=0b11010
>>> for n in [0, 1, 2, 3, 4]:
...     print(bin(a>>n), (a>>n)%2)
0b11010 0
0b1101 1
0b110 0
0b11 1
0b1 1

An equivalent form is to use the integral division operator //:

>>> for n in [0, 1, 2, 3, 4]:
...     print(bin(a>>n), a//(2**n)%2)

The method using the shift operator >> should be faster for large integers. For example on my computer accessing the last bit of the greatest integer (which has 63 bits) is 3 times faster using the shift operator. The method of computing the bit length of an integer (or a long integer) is int.bit_length() (it is new in Python 2.7)

>>> for n in range(10):
...     print(n.bit_length(), end=' ') # this is *Pyhon3* syntax
0 1 2 2 3 3 3 3 4 4

In Python3 the print statement has changed: it is now a function. In Python 2.7 you could do the same with (note the trailing comma after the print statement):

>>> for n in range(10):
...     print n.bit_length(),
0 1 2 2 3 3 3 3 4 4

In Python 2.7 you do not need to use print() as a function, but you may, though the keyword argument end was added in Python 3.

What are methods: in the above example n will be looped over ten integer objects from 0 to 9 (returned by the function range()). Each integer object besides its value (and id() value which we have already met) has a set of operations attached to it, which is common to all int objects. They usually are called methods and are used with the . syntax: object.method(). Methods might also depend on a set of arguments.

If you want to delete an object you can use del statement:

>>> a=5
>>> del a
>>> a
NameError: name 'a' is not defined

The del actually deletes the reference to the object, not necessarily the object itself. That is why the error message says: “name ‘a’ is not defined”.


Here are some useful notes:

  • Python2.7 automatically promotes plain integers to long integers if needed, in Python3 the long integers is the only integer type.

  • long integers in Python2.7 and int in Python3 can encode arbitrary big numbers.

  • Python does mixed arithmetic, e.g. when you add an integer with a long one the result will be a long integer or when you devide an integer by a float the result will be a float.

  • Integer division truncates in Python2.7 but not in Python3 (checkout what is the result of 3/2). In Python3 use integer division // if an integer result is expected.

  • If you are looking for numerical computing (including displaying charts) in Python this site might help you.

  • Python interactive prompt has a special name _ which remembers the last evaluated value. If None was the last value it is not updated:

    >>> x=100
    >>> x+1
    >>> _

    It is very useful when you want to remember it for later computations. This feature is not available in a Python script.

Boolean type

The other set of very important operators are the boolean ones. There are two boolean objects: True and False. Every non empty string literal has a boolean value True, also any non zero number, non empty list or tuple or non empty dict (we will meet these types soon).

and boolean and
or boolean or
not boolean negation

Let us describe how and and or works. The and: it takes two arguments and if both of them are true it returns the last one, otherwise the first (from the left) which has the False value. Thus for example:

>>> a='first'
>>> b='second'
>>> a and b
>>> c=0
>>> d=''
>>> a and b and c and d
>>> a and b and d and c

Also a, b, ... might be Python expressions. The evaluation stops when the first False value is met. The or operator returns the first argument which has True value and if all arguments are False it returns the last one:

>>> a='first'
>>> b='second'
>>> a or b
>>> c=0
>>> d=''
>>> c and d
>>> d and c

As it is with the and, or first evaluates expressions, and it stops when it finds the first True value. This makes and and or efficient.

The if statement

Let us introduce the if, elif, else Python statement. The syntax looks like this

>>> if <if_statement>:
...   <if_code>
>>> elif <elif_statement>:
...   <elif_code>
>>> else:
...   <else_code>

The <if_statement> and <elif_statement> are Python expressions which are evaluated. If <if_statement> evaluates to True then only the <if_code> will be executed. If it evaluates to False then <elif_statement> is evaluated and if it turns out to be true the <elif_code> is executed (only). If both <if_statement> and <elif_statement> are false then the <else_code> will cary on. The important part of Python syntax is the indentation. The each of <if_code>, <elif_code> and <else__code> should be indented more than the indentation of if, elif and else - which all should have the same indentation. Otherwise the IndentationError exception (i.e. an error) is raised. Let us give a simple example:

>>> a=True
>>> b=1;c=3
>>> if a:
...   print('if_code')
...   x=c-b
... else:
...   print('else_code')
...   x=c+b
>>> x


When you write a script in your favourite editor, you might consider setting an option which translates all the tabs to spaces. Mixing tabs and spaces usually leads to the IndentationError. For example if the correct indentation is eight spaces, which visually is the same as a one tab, using a single tab will raise the exception.


Python has no switch statement. You can use if and elif:

>>> if a == 1:
...  print(1)
... elif a == 2:
...  print(2)

See also

The <if_statemtnt> and <elif_statemtn> can be any Python expression. Python expressions are documented here.

and-or trick

One can make ifelse statement with and and or operators. Let as assume that b has always True value then:

>>> a=True
>>> b='the first'
>>> c='the second'
>>> a and b or c
'the first'
>>> a=False
>>> a and b or c
'the second'

This is often called the and-or trick. It is easy to understand how it works according to and and or operators. First let us note that and is executed before the or: (a and b) or c. We assume that b has True boolean value, which is the case. If a is True then the first and will return the value of b. Now because the and block has True value the following or returns b. In this way if a is True the value of b is returned. However if a has a False value, then the and statement returns the value of a and since it is False the or will return c (no matter of boolean value of c). What goes wrong if b is False. If a had a True value then the and statement returns with b and now the or will return b only if c has a False value. There is a simple way to overcome this using lists. We will come back to this in a moment.

Another one-liner for the and-or construction is simply:

>>> 'True' if False else 'False'

Since the above one-liner has no limitations of the and-or trick it is much safer. So you’d better stay with it. Referring to the previous notation it is:

>>> b if a else c

so the control statement is just after if.


There are also other types in Python which are very useful: lists and mapping types (dictionaries). For those who where using Java, JavaScript or PHP: lists resemble arrays and dictionaries are similar to objects. Let us start with an example:

>>> numbers=[0, 1, 2, 3, 4]
>>> type(numbers)
<type 'list'>


We could name the list numbers just by l or list but the former is deprecated since there are fonts in which l looks like 1, while the latter overwrites the built-in list(). There is an excellent style guide for Python PEP 8.

A list is a container which might store any kind of objects. The list in above example stores integers. Lists have quite a few useful methods. For example getting value at a given index, adding and removing values, sorting, etc. On this example we will also learn how to use Python internal documentation and help functions to get information about objects. First let me introduce the dir() function. It returns a list of all methods of a given object.

>>> dir([])
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__',
'__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__',
'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse',

The [] in the first line is an empty list. You can get more information about each method invoking help() function on an object. For example try:

>>> help([])

Let us come back to the above listing. In lines 2–7 are the Python special methods of the list type. All of them starts and ends with a double underscore: __. Some of them are bound to syntax, for example:

>>> a=list(range(5)) # Python2.7 does not require to call the list() constructor
>>> a
>>> a[1]
>>> a.__getitem__(1)

The range() built-in function returns a list of incrementing integers of a given length in Python2.7, though in Python3 it returns an iterator object. You can turn it into a list using the list() constructor function. In Python3 the range which is returned has a more efficient memory storage than a list class. The later stores all the elements in memory, while the range do not. Python lists are always indexed starting from 0, i.e. the first element in the list is the 0-th one. The code a[0] is a syntax to get the first object in the list. This actually calls the __getitem__() method of the object a. The special method __len__(), which returns the length of the object, has a shorter form:

>>> len(a)
>>> a.__len__()

The powerful thing in Python is that these special methods might be overwritten, or added to objects other than the predefined ones and then you can take advantage of the shorter syntax for them. You can also delete elements of a list:

>>> a = list(range(5))
>>> del a[2]
>>> a
[0, 1, 3, 4, 5]

Another very useful special method is the __doc__ string:

>>> print(a.__doc__)
list() -> new empty list
list(iterable) -> new list initialized from iterable's items

It stores the doc string of an object. It is the same which shows in help([]) just under the class statement:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items

 Methods defined here:

Before we introduce the methods in lines 11 and 12 of listing let us describe slicing for lists. You already know that you can the element of a list by its index using the syntax list[n] (where n is the index) if the index n is greater than length of the list minus one (this is the index of the last element) an IndexError will be raised, since you are referring to a non existing place in the list. But this syntax have more quirks.


Slicing is a very basic technique used with list and some other types, like: tuples, strings or more generally with iterators.

>>> L = list(range(10))
>>> L[2:]
[2, 3, 4, 5, 6, 7, 8, 9]
>>> L[2:5]
[2, 3, 4]
>>> L[:3]
[0, 1, 2]

The first argument specifies the index of first included element, the second the index of first excluded character and if present always a list object is returned. If no value is given the default are 0 and len(L) respectively:

>>> L[:5]
[0, 1, 2, 3, 4]
>>> L[5:]
[5, 6, 7, 8, 9]

You can use slicing to change the value of a part of a list:

>>> L[:2] = [ 'a', 'b' ]
>>> L
['a', 'b', 2, 3, 4, 5, 6, 7, 8, 9]

You can also use L[:] which returns a new list which refers to the same objects as L (you can check that the id() of L and L[:] changes). It is called a shallow copy and is a very useful concept. Slicing has also third argument which is the step of a slice:

>>> L = list(range(10))
>>> L[::2]
[0, 2, 4, 6, 8]

You can also refer to the elements counting them from the end:

>>> L[-1]
>>> L[-2:-1]
>>> L[-4:-2:2]
[6, 8]

You can also set the step to negative value. Note that in this case you need to specify the first and last elements in different order.

>>> L[::-1]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
>>> L[5:0:-1]
[5, 4, 3, 2, 1]
>>> L[5::-1]
[5, 4, 3, 2, 1, 0]

You can delete whole slices of a list:

>>> a = list(range(10))
>>> del a[::2]
>>> a
[1, 3, 5, 7, 9]

List’s methods


this method appends an object x to the list. The added object is put at the end of the list:

>>> L=[]
>>> L.append('x')
>>> L

This is equivalent to: L[len(L):]=['x'], i.e.

>>> L[len(L):]=['y']
>>> L
['x', 'y']

Let us note that for the list with 2 elements L[2]='z' will raise the IndexError.


Extend the list by joining it with another list L. The elements of L are put after the elements of the list. The equivalent slice syntax is old_list[len(old_list):]=L.

>>> List=range(1,3)
>>> print(List)
[1, 2]
>>> L=range(3,6)
>>> print(L)
[3, 4, 5]
>>> List.extend(L)
>>> print(List, L)
[1, 2, 3, 4, 5] [3, 4, 5]
>>> List[len(List):]=range(2)
>>> print(List)
[1, 2, 3, 4, 5, 0, 1]
>>> List+['x', 'y']
>>> print(List)
[1, 2, 3, 4, 5, 0, 1, 'x', 'y']

From these three methods extend and slicing are the fastest one, since it does not initialise a new list. You could also use List+=L, this also doesn’t initialise a new object (though extend() and slicing can be considered faster).


returns the smallest index i such that list[i] == x. The __eq__() of the object list[i] is used to check if they are equal.

list.insert(i, x)

insert the object x at index i.

>>> L=range(5)
>>> L.insert(1, 'x')
>>> print(L)
[0, 'x', 1, 2, 3, 4]
The expression L.insert(0,...) will insert at the front of
the list, and L.insert(len(L),...) will append to the list.

will remove the first item from the list which value is equal to x. It raises ValueError exception if there is no such element.

To find the item list.remove() method is compares items using __eq__() methods of the list items.


will return the number of elements in the list with value x. If you want to just check if the element is in the list you can simply check if the count is non–zero or you can use the following syntax:

>>> L=['a', 'b', 'c']
>>> 'a' in L
>>> 'd' in L

This methods counts objects by their value rather than identity:

>>> L=[[],[]]
>>> L[0] is L[1]
>>> L.count([])

The list.count() is using __eq__() method of x to check if the object x is in the list. Also the in keyword is using __eq__() special method.


If i is not given it returns and removes the last element of the list. If i is given the i-th element is returned and removed from the list.

There are also other methods: list.reverse() (which returns the list in reversed order) and list.sort() which can sort it in various ways. Which we will describe later.

Mutable and immutable objects

List are mutable objects, strings, integers and other number types are immutable. Every object in Python has identity which is returned by id(), type which is returned by the type() function and its value. Identity, which is implemented in CPython as the memory address and the type cannot be changed in the object lifetime. For mutable object the value can change while for immutable object the value cannot change. How it is possible that integers are immutable: in Python for each integer number there is unique integer object with this value:

>>> a=10
>>> b=10
>>> id(a) == id(b)

When you declare a=10 you put a label a for this object. When you do the operation a+1 it returns another object of int with value 11. For mutable objects it is possible to change their value: for example a list is a mutable object since you can add an element to it.

It might seen contrary to what we have just said but you can check that when you add 1 to a its identity might change or not (strictly speaking it is not a‘s identity but the object’s identity). For example if you open a new Python shell and type:

>>> a=1
>>> id(a)
>>> a+=1
>>> id(a)

What happens is that the unreachable object 1 is garbage collected – i.e. removed from the address space, and the memory is freed. Now it can be taken by another object. If 1 had still existed (for example another variable was referred to it) it wouldn’t be removed from memory, like in the following snippet:

>>> a=1
>>> b=1
>>> id(a) == id(b)
>>> a+=1
>>> id(a) == id(b)

Here the identity of a changes - though strictly speaking a is just a label and it is more correct to say that the object 2 (which is now labelled by a) get another identity (space in memory). The third line shows another interesting thing, that for immutable objects different labels might correspond to the same object. Here a and b both have the same id() before a starts to refer to 2. For immutable objects this never happens::

>>> a=[]
>>> b=[]
>>> a is b
>>> a == b

We already noted that the is statement compares identities and let us add the == operator compares by value (though this might be changed). How a list can be changed: for example by appending an element to it (which might be any Python object regardless if it is mutable or not). For example::

>>> a=[]
>>> id_a=id(a)
>>> a.append(1)
>>> id(a) == id_a

As you can see this is a different behaviour than the one encountered in example for immutable objects. Understanding this issue is a very important and will pay off later, as this will reoccur in many places. Now it should be clear what happens here:

>>> a=[]
>>> b=[a]
>>> b
>>> b[0] is a
>>> a.append(1)
>>> b

But with immutable objects:

>>> a=1
>>> b=[a]
>>> b[0] is a
>>> a+=1
>>> b

This is so, since a+=1 doesn’t changes 1 to 2 but moves the label a from 1 to 2, while still b[0] refers to 1.


This should be quite obvious, but it might be usefull to say this. Now you know that some Python objects are mutable. And to explain it we have argued that Python variable names are just labels, but the assignment operator:

>>> a = []
>>> b = a

does not mean that the labels a and b are the same. They are independent. For example if you assign a new object to a, b still will point to the previous object:

>>> a = 1
>>> a is b False

Shallow and deep copies

Let us start with a simple example:

>>> a=[]
>>> b=a
>>> a.append(1)
>>> b

Some times this is not what one needs. There is a Python module which provides two functions: copy.copy() and copy.deepcopy(). Modules are libraries of Python functions, objects. To load a module when uses import statement::

>>> import copy

Now we can access functions defined in this module. You can list all of them with dir(copy), or view the docstrings with help(copy). After importing the copy module in this way the two functions are accessible via copy.copy() and copy.deepcopy(). The first one returns a shallow copy of an object while the second one a deep copy. Shallow copy of a list makes a copy only of the list and do not care about its elements, while deep copy also makes a copy of its elements, and if they are lists (or other objects since these methods work many Python objects) it copies them as well. The copy.deepcopy() will not fall into infinite loops (for example if a list it directly or not refers to itself - in Python you can append a list to to itself).

>>> L=[]
>>> a=[1,2,3,L]
>>> b=a
>>> shallow_copy=a[:]
>>> deep_copy=copy.deepcopy(a)
>>> a is shallow_copy
>>> a is deep_copy
>>> a.append('+')
>>> b
[1, 2, 3, [], '+']
>>> shallow_copy
[1, 2, 3, []]
>>> deep_copy
[1, 2, 3, []]
>>> L.append('x')
>>> a
[1, 2, 3, ['x'], '+']
>>> shallow_copy
[1, 2, 3, ['x']]
>>> deep_copy
[1, 2, 3, []]

As in the above example, it is more common to make a shallow copy of a list using slicing a[:] than using the copy.copy() function. You can also use the * operator with lists, note the difference between mutable/immutable objects:

>>> L = []
>>> List = [L]*3
>>> List
[[], [], []]
>>> L.append(0)
>>> List
[[0], [0], [0]]
>>> List[0].remove(0)
>>> List
[[], [], []]
>>> L = 'immutable object'
>>> List = [L]*2
>>> List
['immutable object', 'immutable object']
>>> L += ' changed'
>>> List
['immutable object', 'immutable object']

The for statement

List are very useful with the for statement which we already used in listing. The basic syntax is::

for <var> in <iterator>:

The simplest case is::

>>> for x in "abcd":
...     print(x, ord(x))
('a', 97)
('b', 98)
('c', 99)
('d', 100)

You can iterate not only over lists but also over strings or more generally over any iterator object: an object which has two methods: __iter__() (which turns an object into an iterator) or __next__() method (which every iterator object has). For example any list has __iter__() method:

>>> L = [1,2]
>>> i = L.__iter__()
>>> i
<list_iterator object at 0x877e50>
>>> next(i) # it calles i.__next__(), next() is not present in Python2.7
>>> next(i) # i.__next__ is called at every step of the `for` loop
>>> next(i) # will raise StopItartion exception
>>> # this exception is internally cought at the end of the `for` loop

When you loop over a list (or any other class with __iter__() method) the __iter__() method is called to get an iterator. You may ask why a list object needs to be translated into an iterator object: every iterator object remember its position, so if you want to iterate over elements of a list twice you need two independent iterators. In this way you can do:

>>> L=[1,2]
>>> for l in L:
...  for k in L:
...   print('(%d, %d)' %(l,k), end=' ')
(1, 1) (1, 2) (2, 1) (2, 2)

There are two keywords which can interact with the loop: continue and break. You can put them in the code suite_A. The continue will stop executing suite_A and resume with next item in the iterator. If break is used the iteration stops, the suite_B is not executed, and the program resumes after the for loop statement. The code in suite_B is executed only if the iteration went through all the items in the iterator without stumbling upon a break keyword.


For mutable sequences, like lists, if the code in suite_A modifies the sequence: for example deletes current or previous element then the next element in the iteration will be skipped (if it adds an item before the current one, the current item will be iterated twice). To avoid this you should make a shallow copy of the list:

Good Bad
>>> a=range(5)
>>> for x in a[:]:
>>>    print(x)
...    a.remove(x)
>>> a
>>> a=range(5)
>>> for x in a:
>>>    print(x)
...    a.remove(x)
>>> a
[1, 3]


Unpacking is a very common Python technique. Let us consider a list which elements are list of two elements:

>>> data = [ ['John', 47], ['Alice', 33]]
>>> for name, age in data:
...    print(name.ljust(10)+str(age))
John      47
Alice     33

If you need to enumerate elements of a list you can use the enumerate() function:

>>> tasks = [ 'check emails', 'proofreading', 'test new code' ]
>>> for idx, task in enumerate(tasks,1):
...   print(str(idx)+") "+task)
1) check emails
2) proofreading
3) test new code

In Python3 unpacking was extended and you can use it also in the following way:

>>> first, *elements, last = range(8)
>>> first, last
(0, 7)
>>> elements
[1, 2, 3, 4, 5, 6]

So it is very ease to separte the first element from the rest:

>>> first, *rest = range(4)
>>> first
>>> rest
[1, 2, 3]

Note that the star assignemnt always produces a list:

>>> first, *rest = (1,2,3)
>>> type(rest)
<class 'list'>

See also

Star assignment was introduced in PEP 3132.

The in statement

If you want to check if an element belongs to a list you can use the in:

>>> L = range(10)
>>> 0 in L
>>> 10 in L # the last element in L is 9!

The in checks comparing the values rather than the identity of an object:

>>> a = [0]
>>> L = [[], a ]
>>> a in L
>>> b = [0]
>>> b is a
>>> b in L

As with many syntax elements the keyword in is tied up with a special method, the special method corresponding to in is __contains__(). In our case this method belongs to the lists. Also dictionaries which we will discuss at the end of this part use this method. It is good to understand from very beginning that many syntax elements of Python language are programmable through these special methods.

List comprehension

You can make new list out of old ones using list comprehension. There are also other techniques but this is one is very common and easy to learn since it is quite intuitive. Let us give a simple example:

>>> [ 2*x for x in range(5) ]
[0, 2, 4, 6, 8]

But you can also filter elements:

>>> [ x for x in range(5) if x%2 ]
[ 1, 3]
>>> [ 2*x for x in range(5) if x%2 ]
[ 2, 6]
>>> files = [ 'README.txt', 'source.py', 'setup.py', 'INSTALL.txt']
>>> [ f for f in files if f.endswith('.txt') ]
[ 'README.txt', 'INSTALL.txt']

You can nest two loops:

>>> L = [ ['a', 1], ['b', 2], ['c', 3] ]
>>> [ y for x in L for y in x ]
['a', 1, 'b', 2, 'c', 3]

This syntax is more easily understood when we will expand it into two loops:

>>> K=[]
>>> for x in L:
...     for y in x:
...         K.append(x)
>>> K
['a', 1, 'b', 2, 'c', 3]

Here is another example:

>>> files = [ 'README.txt', 'source.py', 'setup.py', 'INSTALL.txt']
>>> [ (i, f) for f in files if f.endswith('.py') for i in range(2) ]
[(0, 'source.py'), (1, 'source.py'), (0, 'setup.py'), [1, 'setup.py']]

In this way you can filter the elements of the files list, while still looping over the second list. You could also add the if part for the second loop, or even add more for statements. In the previous examples we used tuple objects which are very similar to list objects. We will describe them in the coming section. There is a note on how tuples have to be used in list comprehension.

Here we give an example how to add a row R to every row of a matrix M:

>>> R = [1, 0, 2]
>>> M = [[1, 0, 0], [0, 2, 2]]
>>> [[ row[i]+R[i] for range(3) ] for row in M]
[[2, 0, 2], [1, 2, 4]]


Add a column C= [1, 2] to every column of M.

Click for a solution:
>>> [[x+C[i] for x in M[i]] for i in range(2)]


Add a column C=[0,1] to the matrix M. How to get a new list without modifying M?

Click for a solution:
>>> [ M[i].append(C[i]) for i in range(2) ]
[None, None]
>>> M
[[1, 0, 0, 0], [0, 2, 2, 1]]

The reason why this method returns [None, None] is that the list.append() returns None after appending an element to a list. If you want to construct a new list without modifying the list M here is the solution:

>>> [ M[i]+[C[i]] for i in range(2) ]
[[1, 0, 0, 1], [0, 2, 2, 2]]
>>> M
[[1, 0, 0], [0, 2, 2]]


Let L be a matrix:

>>> L = [ [1, 2, 3], [5, 6, 7] ]

Can you transpose the matrix using list comprehension? If you have problems how to do that look in here.

See also

You should also review the examples here.


Tuples are very similar to lists but have less methods, and thus are much faster.

>>> a=(0,1,2,3)
>>> a[0]
>>> a[1:-1]

Actually you don’t need the brackets (, ) to specify a tuple, but you always need a comma when you specify a one element tuple:

>>> a = 0, 1
>>> b = 0,

Tuples are immutable. You can not assign value to a tuple:

>>> a = (0, )
>>> a[0] = 1
TypeError: 'tuple' object does not support item assignment

As I mentioned tuples are immutable, but they can hold mutable objects inside:

>>> L = []
>>> t = (L,)
>>> L.append(1)
>>> t

The empty tuple () has a boolean value False, non empty tuple has True value. Thus you can also make the and-or-trick using tuples:

>>> a = True
>>> b = ()
>>> c = ':)'
>>> (a and (b,) or (c,))[0]

You can make sequence unpacking using tuples:

>>> t=0, 1, 2
>>> a, b, c = t
>>> a, b, c
(0, 1, 2)

You can loop over elements of a tuple:

>>> t=(1,2,3,4)
>>> for e in t: print(e%2, end=' ') # or use ``print t%2,`` in python2.7
1 0 1 0

Let us came back to the and-or: a and b or c. If b is False we can put b and c in a tuple. Then (b,) as a non empty list has a boolean value True and the and-or trick will work. It now looks like:

>>> (a and (b,) or (c,))[0]

Don’t forget about a colon to specify a tuple of length 1 (since (b) is b). You could also use a lists, but note that tuples are simpler, thus faster. Let us note (once more) that you can use inline if statements without the and-or-trick bracket yoga:

>>> 'True Value' if False else 'False Value'
>>> 'True Value' if True else 'False Value'
'True Value'

When you use tuples in list comprehension it is necessary to use the round brackets ( and ) explicitly:

>>> files = [ 'README.txt', 'source.py', 'setup.py', 'INSTALL.txt']
>>> [ (i, f) for f in files if f.endswith('.txt') for i in range(2) ]
[(0, 'README.txt'), (1, 'README.txt'), (0, 'INSTALL.txt'), (1, 'INSTALL.txt')]
>>> [ i, f for f in files if f.endswith('.txt') for i in range(2) ]
SyntaxError: invalid syntax

Tuples methods

There are only two methods:


Return the number of occurrences of x in the tuple:

>>> t=([],[],[1],[1])
>>> t.count([]) # counts by value rather than id()
>>> id(t[0]) == id(t[1])

Return the smallest index where x can be found. Raises ValueError if x cannot be found:

>>> t=([], [1])
>>> e = [1]
>>> t.index(e) # by value rather than by id()
>>> id(e) == id(t[1])

You can also use the in operator with tuples:

>>> t = (1,2,3)
>>> 2 in t

Strings: str and unicode types in Python2.7

There are two types of string in Python 2 series. In Python 3 things get simplified and bit different. In this section we focuse on strings in Python2. Even if your aim is to learn Python3 this section might be usefull - to refresh your knowledge about Python2.7 and its unicode support or if you will work on a project written using Python2. If you just want to learn Python3 you can safely skip this section - the next one is right for you. The two types in question are str and unicode. Let us give some examples::

>>> 'a'
>>> type('a')
<type 'str'>
>>> u'a'
>>> type(u'a')
<type 'unicode'>
>>> text=u'The \u03c0 number was known to Greeks.'
>>> text
u'The \u03c0 number was known to Greeks.'
>>> print(test)
'The π number was known to Greeks.'

The \u03c0 is an escape sequence which let to write unicode characters by its hexadecimal number. In this case it is 03c0:

>>> 0x03c0

In the same way as number types, both str and unicode are immutable, moreover there is always a unique str object of type str or unicode:

>>> s='xxx'
>>> t='xxx'
>>> id(s) == id(t)
>>> u=u'xxx'
>>> id(s) == id(u)

Every text file on your hd-drive is written in some encoding. These encoded strings would be in Python 2 of type str. The unicode unifies them in the following sense: you can decode them from their encoding to get an instance of unicode, and vice versa you can encode a unicode string in whatever encoding you might think off. You probably have heard about utf-8 unicode encoding, latin-1 encoding. These are examples of encodings. Note that utf-8 is not a unicode. In our context, an encoding is a rule how to translate unicode back and forth to an encoded str. The unicode literals are specified with a u or U prefixes, strings are specified without any prefix (though in Python version 2.7 you can use b prefix - which has a true meaning in Python 3 where string objects changed significantly). Another way to construct an instance of unicode is to use the unicode() method::

>>> uni = unicode('abc')
>>> uni
>>> type(uni)
<type 'unicode'>
>>> s = 'abc'
>>> s
>>> type(s)
<type 'str'>
>>> s.decode()
>>> uni.encode()

The str.decode() and unicode.encode() methods can be used to translate from str type to unicode. If no argument is specified the default encoding is used which is:

>>> import sys
>>> sys.getdefaultencoding()

That means only ascii characters can be encoded/decoded. There are only 128 ascii characters (with numerals from 0 to 127) which is not much. For example the accented French letters like à, á, é cannot be decoded. Let us see this on another example using π Greek letter::

>>> pi=u'\u03c0'
>>> pi_utf8 = pi.encode('utf-8')
>>> pi_utf8
>>> pi.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u03c0' in position 0: ordinal not in range(256)

Why we get the UnicodeEncodeError? Simply because there is no rule how to encode π in latin-1 but we can encode using utf-8. There is also UnicodeDecodeError which is thrown when an str string cannot be decoded:

>>> e_str = '\xc3\xa9'
>>> print(e)
>>> e_str.decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

The str.decode() and unicode.encode() methods have an additional parameter which changes how errors are handled. One can ignore unicode errors, or replace the character. We will come back to this latter in this section. The same unicode instance (i.e. element of unicode type) can have different representations as a str type. Simply different encodings give different result:

>>> e_uni = e_str.decode('latin-1')
>>> e_str.encode('utf-8')

This even might happen with ascii strings::

>>> a_uni = u'a'
>>> a_str.encode('utf-8')
>>> a_str.encode('utf16')
>>> a_str.encode('utf32')

The utf-8 encoding uses one byte for ASCII: and it has the same value as in the ascii encoding, but it can use up to four bytes for some characters. You can read about the unicode standard at wikipedia. The unicode follows this standard closely.

\draw[fill=outerColor, thick] (1, 0) circle [x radius=1.8cm, y radius=2.4cm];
\node[below] at (1, -2.7) {\textbf{\textsf{unicode}}};
\fill[color=middleColor] (1, 0) circle [x radius=1.2cm, y radius=1.4cm];
\fill[color=innerColor, rotate=17] (1, 0) circle [x radius=0.3cm, y radius=0.6cm];

\draw[fill=outerColor, thick] (7, 0) circle [x radius=1.8cm, y radius=2.4cm];
\node[below] at (7, -2.7) {\textbf{\textsf{str}}};
\node[fill, color=middleColor, regular polygon, regular polygon sides=5, minimum size=2.6cm] at (7cm, 0) {};
\node[fill, color=innerColor, regular polygon, regular polygon sides=5, minimum size=1cm, rotate=30] at (6.8cm, 0.2cm) {};

\draw[->, thick] (0.9, 0.1) .. controls (3, 1) and (5, 1) .. node[below,pos=0.5]{\small\textsf{encode()}} (6.8, 0.2);
\draw[<-, thick] (0.9, 0.3) .. controls (3, 1.2) and (5, 1.2) .. node[above,pos=0.5]{\small\textsf{decode()}} (6.8, 0.4);

\draw[->, thick] (1, -0.9) .. controls (3, -2.0) and (5, -2.0) .. node[below,pos=0.5]{\small\textsf{encode('latin1')}} (7, -0.7);
\draw[<-, thick] (1, -0.7) .. controls (3, -1.8) and (5, -1.8) .. node[above,pos=0.5]{\small\textsf{decode('latin1')}} (7, -0.5);

unicode.encode() and str.decode() methods in Python2

To see that it really works let us make a small experiment. We will write π encoded in the traditional Chinese encoding. Your task is to open the file in an editor change the encoding to big5 and see if you get the π symbol as you should.

>>> pi=u'\u03c0'
>>> file_object = open('pi_encodings.txt', 'w')
>>> file_object.write('pi (utf-8): %s\npi (big5): %s' % (pi.encode('utf8'), pi.encode('big5')))
>>> file_object.close()

In the first line the built-in method open() returns a file object. The first argument is the file name (it might be str or unicode). The second argument is the mode. The default mode is r, which means open file for reading. Here we use w which allows us to write to the file. Note that this mode also truncates the file, i.e. removes all the previous contents. The mode a is for appending which allows to write to the file without removing its contents. In the third line we used the technique called string formatting in the most simple and common way. The string is formed by substitutions of escape sequences (here the two %s) with the strings from the tuple following the % sign. For example:

>>> 'email: %s@%s' % ('john', 'python_intro.edu')
'email: john@python_intro.edu'

The %s assumes that the corresponding tuple entry is a string. If not the str() function is used (if the object has __str__() method it will be used here, otherwise the __repr__() will be used). That means that integers or floating numbers will be translated into strings. But there are other, better escapes for integers and other types. For example %d for signed integers, or %x for signed hexadecimal value. We will come back to this method in a greater detail later

>>> '%d=%x' % (255, 255)

You can also use + which acts as the string concatenation operator, but all the arguments need to be manually transformed to strings:

>>> 'john'+'@'+'python_intro.edu'
>>> 'john'+3
TypeError: cannot concatenate 'str' and 'int' objects

String formatting as well as string concatenation works also for unicode. Python 2 allows you to mix unicode and str but as you will see this leads to hard traceable errors and should be avoided.

Let us now look at the Greek letters:

>>> from __future__ import print_function # Python2.7
>>> for x in range(945, 969):
...     print(x, end=' ')
...     print(hex(x), end=' ')
...     print(unichr(x), end=' ')
...     print(repr(unichr(x)), end=' ')
...     print(repr(unichr(x).encode('utf-8'), end=' ')
...     print(repr(unichr(x).encode('cp737'))
945 0x3b1 α u'\u03b1' '\xce\xb1' '\x98'
946 0x3b2 β u'\u03b2' '\xce\xb2' '\x99'
947 0x3b3 γ u'\u03b3' '\xce\xb3' '\x9a'
948 0x3b4 δ u'\u03b4' '\xce\xb4' '\x9b'
949 0x3b5 ε u'\u03b5' '\xce\xb5' '\x9c'
950 0x3b6 ζ u'\u03b6' '\xce\xb6' '\x9d'
951 0x3b7 η u'\u03b7' '\xce\xb7' '\x9e'
952 0x3b8 θ u'\u03b8' '\xce\xb8' '\x9f'
953 0x3b9 ι u'\u03b9' '\xce\xb9' '\xa0'
954 0x3ba κ u'\u03ba' '\xce\xba' '\xa1'
955 0x3bb λ u'\u03bb' '\xce\xbb' '\xa2'
956 0x3bc μ u'\u03bc' '\xce\xbc' '\xa3'
957 0x3bd ν u'\u03bd' '\xce\xbd' '\xa4'
958 0x3be ξ u'\u03be' '\xce\xbe' '\xa5'
959 0x3bf ο u'\u03bf' '\xce\xbf' '\xa6'
960 0x3c0 π u'\u03c0' '\xcf\x80' '\xa7'
961 0x3c1 ρ u'\u03c1' '\xcf\x81' '\xa8'
962 0x3c2 ς u'\u03c2' '\xcf\x82' '\xaa'
963 0x3c3 σ u'\u03c3' '\xcf\x83' '\xa9'
964 0x3c4 τ u'\u03c4' '\xcf\x84' '\xab'
965 0x3c5 υ u'\u03c5' '\xcf\x85' '\xac'
966 0x3c6 φ u'\u03c6' '\xcf\x86' '\xad'
967 0x3c7 χ u'\u03c7' '\xcf\x87' '\xae'
968 0x3c8 ψ u'\u03c8' '\xcf\x88' '\xaf'

Note that we used Python 3 syntax for the print() function in the above snippet. If you are using Pytohn 2.7 you can convert it to python statements or just:

>>> from __future__ import print_function

The above statement will load the print() function of Python 3 into your name space, i.e. you will be able to use it. After loading it, you cannot use print() as a statement anymore:

>>> print 'Hello'
SyntaxError: invalid syntax

The unichr() function returns a unicode string out of its numeral (code point). There is also chr() function which returns a str in the ascii range 0-127 (hex(128) == 0x80 in hexadecimal system), the range of unichr() depends how your Python was compiled, but it should be something like 0-1114111 (hex(1114112) == 0x110000). It is also stored in sys::

>>> import sys
>>> sys.maxunicode

The repr() actually calls the method __repr__() and thus its result depends on the object:

>>> alpha=unichr(945)
>>> '__repr__' in dir(alpha)
>>> apha.__repr__()

For unicode it returns the unicode literal, checkout yourself what repr() returns on a list, what is its type?. That’s right __repr__() returns something of type str. Since in Python you can program your own classes which might have their own __repr__() method there is a gentleman agreement that it should always return an str string. The idea behind repr() is that it should return a string which can be evaluated and give an object of the same type and value.

Let us now explain what is in each column. In the first one we have the number of a unicode character in decimal system (which is called code point), in the second its hexadecimal representation. Third column prints these characters.


If your terminal is is not using utf8 encoding you probably see some garbage. In this case you need to change the print statement to something like:

>>> sys.stdout.wrtie(unichr(x).encode(encoding))

where encoding is a name of encoding your system is using. If you dont know it you can use the following value:

>>> import locale
>>> encoding = locale.getpreferredencoding()

Don’t forget to import sys if you haven’t done it so far. On Windows you can check that both solution will not work on the dos prompt but it should run within the IDLE environment.

In the fourth column we have unicode literals, and as we can see the numbers in the escape sequences just after the \u correspond to the hex values of code points. Then we have the two columns with str types. In the first of them every Greek letter is encoded with two bytes: the number after \x represents a hex number in range 0-255 (Indeed they have two hex digits 16**2=256). This is just one byte since it is equal to 8 bits as 2**8=256. The Greek encoding is more concise for these letters: every Greek letter is represented by a single byte. The Greek encoding cp737 agrees with ascii in its range. Can you write a snippet which will check it by your self. Check out my solution:

Click for a solution:

This code will check if cp737 and ascii encoding agree in the range(128):

>>> all([ unichr(x).encode('cpc737') == chr(x) for x in range(128) ])

Here I used the built-in all() function which returns True if all elements of the list are True.

First thing to observe is that in different encoding the representation might take different number of bytes.

The utf-8 encoding can encode all unicode strings. Let us check this:

>>> import sys
>>> L = [ unichr(x).encode('utf-8') for x in range(sys.maxunicode+1) ]

Now since we didn’t get an UnicodeEncodeError the list L contains all utf-8 encoded unicode strings. Don’t try to print this list, if you did you should be able to break the listing by pressing CTRL-c combination (but the code might not break immediately).


Print the data using string formatting and only one print statement.

Click for a solution (important):

The most common mistake is to forget explicitly encode the unicode. Note that string concatenation (the + operation) of a str and a unicode will produce a unicode. This is done by decoding the str using decode() with the default encoding which is ascii. Now, if the str has a non ascii characters a UnicodeDecodeError will be raised. The correct way in Python 2 is to do explicit decoding (in Python 3 things are simpler).

>>> for x in range(945, 969):
...     u = unichr(x)
...     print(("%d %x "+"%s "*3) %
...            (x, x, u.encode('utf-8'), repr(u.encode('utf-8')), repr(u.encode('cp737'))))

The brackets around "%d %x "+"%s "*3 are necessary so that first this string concatenation is evaluated. "%s "*4 results in "%s %s %s %s ".

In Python you can break the line without line continuation character if it is inside brackets. The line continuation character is a single backslash \ at the end of a line, but it must not be followed by trailing spaces. Thus it is better to surround expressions with a innocuous pair of brackets (, ) and break a line for free than use the line continuation character.

See also

Unicode tutorial from the informal introduction to Python.

Unicode howto. It is a good document if you want to go deeper in to unicode support in Python and some problems with it.

The Unicode Consortium home page of unicode standard. You can also read the following clear introduction. If you do not know the difference between glyph and a code point you might find it helpful. It also includes short introduction to UTF-8, UTF-16 and UTF-32 encodings.

Wikipedia article on unicode.

PEP 100 Python technical specification of its unicode support.

Strings: bytes and str types in Python3

The string types have significantly changed in Python3. There are also two types of strings, but this time they are named: bytes (which corresponds in Python2 to str) and str (which corresponds in Python2 to unicode). The most important change is that in Python3 there is no implict decoding. In Python2 you could do:

>>> 'a'+u'b' # Python2.7

There is no problem when all characters are ASCII, but with:

>>> unichr(960).decode('utf-8')+u'b' # the first argument is Python2 str
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcf in position 0: ordinal not in range(128)

The possibility of mixing these two types in Python2 lead to many hardly tracable errors. Hence the Python developer team decided to keep the two types seprate. Moreover, since in Python2 you should always work with the unicode type, and decoding them to unicode Python3 decodes bytes for you in some places: for example when you read a file. But let us start from the beginning:

>>> string = 'abc' # Python3
>>> string

As you can see Python3 str literals are not prefixed with u as was the case of Python2 unicode (though since Python3.3 you can use it - this was added to make writting Python2 and Python3 compilant code easier.) How to get a bytes: as in Python2.7, the Python3 str is a Python representation of unicode, which must be encoded in a given encoding to a bytes.


Encodings are ways of writting the same unicode. If you open a good text editor, you will find that there are many encodings you may use to represent characters. For most of encodings the ASCII characters are represented in the same way, the difference comes when you try to look for other characters like é. Files on your computer are written in a given encoding, so when you read data from files your software have to know in which encoding they are written to get a meaningful data. An example of encoding is latin1 or utf-8 (which often is refered to as unicode - which might be a bit confusing, so we will avoid this). The encoded data is represented in Python3 by bytes and the decoded (into unicode) by str.

Encoded strings are represented by bytes (i.e. numbers from 0 to 255) in computers memory, while unicode is a sequence of numbers from 0 to 0x10ffff, which is equal to 1114111 (hex(1114111) == 0x10ffff). This is a huge amount of slots which allows to represent characters of all languages and additional symbols used in typesetting around the word. Each unicode code point (number) represents a unique character. For example ‘a’ is represented by the number (code point) 97. The code point 945 corresponds to ‘α’.

Encoding (in Python) is a mapping how the unicode code point (str in Python3 or unicode in Python2) is represented by bytes (bytes in Python3 or str in Python2).

As in Python2 there are encode and decode methods. This time they work this way:

\draw[fill=outerColor, thick] (1, 0) circle [x radius=1.8cm, y radius=2.4cm];
\node[below] at (1, -2.7) {\textbf{\textsf{str}}};
\fill[color=middleColor] (1, 0) circle [x radius=1.2cm, y radius=1.4cm];
\fill[color=innerColor, rotate=17] (1, 0) circle [x radius=0.3cm, y radius=0.6cm];

\draw[fill=outerColor, thick] (7, 0) circle [x radius=1.8cm, y radius=2.4cm];
\node[below] at (7, -2.7) {\textbf{\textsf{bytes}}};
\node[fill, color=middleColor, regular polygon, regular polygon sides=5, minimum size=2.6cm] at (7cm, 0) {};
\node[fill, color=innerColor, regular polygon, regular polygon sides=5, minimum size=1cm, rotate=30] at (6.8cm, 0.2cm) {};

\draw[->, thick] (0.9, 0.1) .. controls (3, 1) and (5, 1) .. node[below,pos=0.5]{\small\textsf{encode()}} (6.8, 0.2);
\draw[<-, thick] (0.9, 0.3) .. controls (3, 1.2) and (5, 1.2) .. node[above,pos=0.5]{\small\textsf{decode()}} (6.8, 0.4);

\draw[->, thick] (1, -0.9) .. controls (3, -2.0) and (5, -2.0) .. node[below,pos=0.5]{\small\textsf{encode('latin1')}} (7, -0.7);
\draw[<-, thick] (1, -0.7) .. controls (3, -1.8) and (5, -1.8) .. node[above,pos=0.5]{\small\textsf{decode('latin1')}} (7, -0.5);

str.encode() and bytes.decode() methods in Python3

Now let us see how looks bytes type in Python3:

>>> a = 'a'.encode('latin1') # Python3
>>> a # note the b in the literal:
>>> type(b)
<class 'bytes'>

And you can get back the str

>>> s = a.decode('latin1')
>>> # we should use the same encoding in which a was represented
>>> s
>>> type(s)
<class 'str'>

In Python2.7 we used unichr() and chr methods, now in Python3 there is only chr() which will produce an str (hence it corresponds to Python2.7 unichr).

>>> s = 'Greeks new the value of %s' % chr(960)
>>> s
'Greeks new the value of π'
>>> b=s.encode('utf-8')
>>> b
b'Greeks new the value of \xcf\x80'

In the first line we use basic string formatting, which works in the same way as in Python2: the ‘%s’ escape sequence is substituted with the value of chr(960).

In Python3 the str behaves mostly like you would expect from string class, and the type bytes behaves a bit different. For example, you can iterate over both str and bytes:

>>> for c in s: print(c, end=' ')
G r e e k s   n e w   t h e   v a l u e   o f   π  >>>
>>> for c in b: print(c, enc=' ')
71 114 101 101 107 115 32 110 101 119 32 116 104 101 32 118 97 108 117 101 32 111 102 32 207 128  >>

This reveals the nature of bytes: this is a sequence of bytes, i.e. numbers from 0 to 255 (2**8 of them). Let us see how simple strings are represented as bytes:

>>> b'a'[0] # access the first byte

Yes, this agrees with the ASCII code of ‘a’.

>>> b = str(960).encode('utf-8') # pi character in utf-8 encoding
>>> for c in b: print(c, end=' ')
207 128

As you can see, π is represented by two bytes in utf-8 encoding, but by just a one byte in cp737 encoding (this is a Greek encoding):

>>> b = str(960).encode('cp737')
>>> for c in b: print(c)

The str is a universal representation of strings in Python. If you have the same text which is encoded in two different encodings (and thus represented in Python3 in a bytes or str in Python2), if you decode it to Python3 str (with respective encoding) you will get exactly the same object. Let us see this. We have seen that π is represented by a single byte 167 in cp737 and by two bytes 207, 128 in utf-8:

>>> a = bytes([167])
>>> b = bytes([207, 128])
>>> a.decode('cp737') == b.decode('utf-8')

There is something special about the ‘utf-8’ encoding though:

>>> b # π representation in utf-8
>>> chr(960).encode('utf-8')

This 960 is the code point of π in the utf-8 encoding, and this will be true for any str character. In the same way as in Python2 the Python3 string types bytes and str are immutable.

Also for both bytes and str concatanation can be used in the same way as in Python2:

>>> b'a'+b' '+b'\xcf\x80' # in utf-8 encoding
b'a \xcf\x80'
>>> 'a'+' '+chr(960) # the same in str
'a π'

If you have skipped the previous section go back and see the last exercise and the see also box.

Working with strings

After this theoretical introduction to strings let us learn how to work with them and what operations and syntax quirks are allowed.

Strings in Python are in fact sequences. You can loop over them and you can also slice over them in the same way as with lists or tuples:

>>> string="this is an instance of str class"
>>> for x in string: print(x.upper(), end=' ')
T H I S  I S  A N  I N S T A N C E  O F  S T R  C L A S S

It is also possible to print this statement with out the spaces, letter by letter. For this we need to import the sys and use the sys.stdout file-like object

>>> import sys
>>> for x in string:
...   sys.stdout.write(x.upper())

This examples shows a general rule how one uses object methods. A string (both str or unicode) have the upper() method. It prints the upper-case of a string. The sys.stdout has a write() method (as all file-like objects), which allows to write to it. In our case it connected to character device which the Python shell is using. There are also the other two: sys.stdin and sys.stderr. You can write to both sys.stdout and sys.stderr (which is usually used for outputting script error messages) and you can read from sys.stdin. This is even possible from Python shell:

>>> x=sys.stdin.read(10)
>>> x # will be truncated after 10th byte

Here you need to specify an argument to the read() methods. It simply reads 10 bytes and returns them. Note that the new line which are included when pressing the Enter key after 4 and 8 are also recorded. The new line is represented by \n. On Windows a new line in a file is represented by two bytes CR (carriage return) followed by LF (line feed), on MacOs by CR and on Linux by LF:

>>> LF=u"\u000A" # Python2.7 & 3.3
>>> CR=u"\u000D"
>>> sys.stdout.write(LF)

>>> sys.stdout.write(CR)
>>> x=sys.stdout.write(LF+CR)

>>> x

In Python3.3 you get the numbers 1, 1 and 2 above. These are return values of the write() method which say how many bites were written. You can catch it like in the last line above. In Python2.7 write() method does not return enything (i.e. it returns None).

On which system this output was produced? Here we did not use the print() statement since it adds a new line at the end, writing to sys.stdout outputs exactly what was given. The LF in Python can be represented by '\r' as str (Python3) or b'\r' as bytes.


Since string types are sequences you can also slice them. As with list and tuple types the index of first element is 0. The slicing syntax is common for strings, lists and tuples, and also can be implemented for objects you define. Let us start with a Greek sentence.:

>>> if sys.version_info.major < 3: # Python2.7
...   greek=u'\u1f00\u03b3\u03b5\u03c9\u03bc\u03ad\u03c4\u03c1\u03b7\u03c4\u03bf\u03c2 '
...   greek+=u'\u03bc\u03b7\u03b4\u03b5\u1f76\u03c2 \u03b5\u1f30\u03c3\u03af\u03c4\u03c9'
... else: # Python3
...   greek='ἀγεωμέτρητος μηδεὶς εἰσίτω'
>>> print(greek)
ἀγεωμέτρητος μηδεὶς εἰσίτω

This was the motto of Plato’s Academy “Let no one untrained in geometry enter.” Note that we didn’t used a + to join the two parts. It is possible to join strings this way. It will also work to join unicode and str but this is not recommended since the str will be decoded using the ascii codec and as we already mentioned this might lead to UnicodeDecodeError errors. The first word has 12 letters:

>>> print(greek[:12])
>>> print(greek[13:19])
>>> print(greek[-1])
>>> print(greek[-2:])

The penultimate example brings the last element of the string, the last example slices the last two elements. If you want to reverse a string (list, tuple) you can specify the third slicing argument: the step.

>>> x='0123456789'
>>> x[::-1]
>>> x[1:-1:2]
>>> x[-8:-2]

But if you want to inverse the last example, you have to be careful since:

>>> x[-8:-2:-1]
>>> x[-2:-8:-1]

It is easy to understand though. In the first example we start at the position -8 with the step -1 so we go out of the range in the first step. In the second example we start at position -2 and go till -8 with step -1. So the most complicated example is:

>>> x[-2:-8:-2]

Can you write down what it gives? Does it agree with what Python shell outputs for you? The rule to remember is that the first slicing argument is where we start and the sign of the third one gives the direction. The second argument should agree with it.

Slice objects

Everything in Python is an object. What about slices you may ask. Well there is also a <type 'slice'>. It can be initiated using the slice() method. It takes three arguments as a slice does:

slice([start], stop[, step])

The arguments in [...] are one that are optional, since they have default values: start=0 and step=1. The only obligatory argument is stop. Here is how one can use a slice object:

>>> L=range(10)
>>> Slice = slice(2,8,2)
>>> L.__getitem__(Slice)
[2, 4, 6]

There is a more important slicing function for iterators, which generalise list - the don’t hold the data in memory but rather the recipe how to get next element. They are very useful when dealing with large set of data, or when it easier to generate is step by step rather than define them at once. You can check out the itertools module.

Common string operations

A very common operation is to split a string (or a unicode: methods described in this section works for both str and unicode objects). Since splitting words separated with a white space is very common it is treated in a special way in Python. String objects have str.split(). By default it splits a string on white space or white spaces:

>>> x='This       is    a   simple    sentence.'
>>> x.split()
['This', 'is', 'a', 'simple', 'sentence']

If you pass an argument to it, it will split at the given string:

>>> 'aaaYbbbYYccc'.split('Y')
[ 'aaa', 'bbb', '', 'ccc']

If you have a long text you might want to split at the new lines. You could use the split('\n'), but there is a better way:

>>> text="""This is a text
which is broken in several
>>> text.splitlines()
[ 'This is a text', 'which is broken in several', 'lines.']
>>> text.splitlines(True)
[ 'This is a text\n', 'which is broken in several\n', 'lines.\n']

The first thing is that we used the triple """ (you can also use ''', it just have to match at the end). This allows to write strings which contain new lines. You could do that also with a sing " or ' but then you have write each new line as \n. The str.splitlines() has one argument which by default is :data:False. If True it includes the newlines. It is more involving to write this with using str.split('\n') so that the number of lines is intact.

There is an inverse operation which takes a list and joins its elements using a given string. Surprisingly it looks like this:

>>> L=['Hello', 'word.']
>>> ' '.join(L)
'Hello word.'

That it is, it is a string method. This is quite confusing at the beginning. Equally well, you can use string.join() function (only in Python2.7) from the string module:

>>> from string import join # only in Python2.7
>>> join(['1', '2', '3'], '-')

Note that the list must contain str or unicode objects. The first line adds the join function to the name space. If we import whole module with import string then one have to write string.join(...). It is also possible to alter the imported name:

>>> from string import join as sjoin # only in Python2.7
>>> sjoin(['a', 'b', 'c'], '_')

Writing and reading files

Usually the data for your programs will be read from another files and after processing them they might be put into another file. Reading and writing from a file is quite easy. We will first use the built-in open() method and I will also show you the codecs.open() which has a similar syntax to the built-in open() in Python 3. First let us write some data to a file. I will use the location /tmp/python_intro.txt which is fine for MacOs and Linux users. If you are on Windows platform you should change the location. Note that if there is such file it will be overwritten.

In Python 2 the built-in open() methods can be used with both unicode and str objects. But if you use it with unicode (which is the preferred way! for the reasons we advertised) the string will be encoded using the default ‘ascii’ encoding, hence you should encode the unicode object by hands.

>>> if sys.version_info.major < 3: # Python2.7
...   motto=u'''The Plato's Academy motto was:
...   '\u1f00\u03b3\u03b5\u03c9\u03bc\u03ad\u03c4\u03c1\u03b7\u03c4\u03bf\u03c2 \u03bc\u03b7\u03b4\u03b5\u1f76\u03c2 \u03b5\u1f30\u03c3\u03af\u03c4\u03c9'
...   '''
... else: # Python3
...   motto="""The Plato's Academy motto was:
...   'ἀγεωμέτρητος μηδεὶς εἰσίτω'"""
>>> if sys.version_info.major < 3: # Python2.7
...   file_object = open('/tmp/python_intro.txt', 'w')
...   file_object.write(motto.encode('utf-8'))
... else: # Python3
...   file_object = open('/tmp/python_intro.txt', 'w', encoding='utf-8')
...   file_object.write(motto)
>>> file_object.close()

In this way we have wrote the motto to a file using utf-8 encoding (you can use both utf-8 and utf8 as the encoding name). To use the file.write() method we need to open the file in a writable mode. There are two such modes: w which writes to the file and truncates what was there before (i.e. deletes) and a - append to the file. It is always good to close the file when we end using it so that other parts of your system can use it. As you can see it is possible to use single ' inside triple quotes '''`. Note the difference, in Python3 you can pass the encoding as a keyword argument - if it is not given a default system encoding is used which is platform dependent (whatever local.getpreferredencoding() returns), however on Python2.7 (and earlier version) an implicit ASCII encoding is used so in general you have to encode the data with the encode() method. Without it we will get an error since the greek letters cannot be encoded in ASCII. Now we can read the file

>>> file_object = open('/tmp/python_intro.txt', 'r')
>>> motto_str = file_object.read()
>>> motto_str == motto
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
>>> motto_str.decode('utf8') == motto
>>> file_object.tell()
>>> file_object.seek(0)
>>> motto_lines = file_object.readlines()
>>> file_object.close()
>>> for line in motto_lines:
...     print line,
The Plato's Academy motto was:
'ἀγεωμέτρητος μηδεὶς εἰσίτω'

To read from a file you need to open it in reading mode. This mode is specified by r argument to the open() method. The file.read() method of a file object reads from the file and returns an str string. The warning which we get is again caused by the default ‘ascii’ encoding when the motto_str is decoded using the default ‘ascii’ encoding. After reading whole file with file.read() the current file object position is set at the end: in this case this 87’s byte as returned by file.tell(). Then we set the file object position again to 0 (as it was when we opened the file) and now we read lines with file.readlines(). You can also read parts of the file:

>>> file_object = open('/tmp/python_intro.txt', 'r')
>>> file_object.readline()
"The Plato's Academy motto was:\n"
>>> file_object.tell()
>>> file_object.seek(32) # skip over "'"
>>> file_object.tell()
>>> first_word = file_object.read(25)
>>> print(first_word)

In the file.readline() method read just one line from the file object. Then in line 6 we go past the quote ' which starts the second line, and then we read next 25 bytes. Note that if you read 24 bytes you will just read part of the utf-8 representation of ς and you will not be able to decode it.

In the codes module you can find the codecs.open() function. It has additional two parameters: encoding and errors:

codecs.open(filename, mode[, encoding[, errors[, buffering]]])[source]

Open the file in mode using encoding. The errors parameter defines how errors are handled. It has the following possible values:

  1. 'strict' (default) raise UnicodeDecodeError or

    UnicodeEncodeError on decoding/encoding errors,

  2. 'ignore' ignore encoding errors, which might lead to data loss,

  3. 'replace' replace bytes which could not be encoded/decoded with

    u'\ufffd' replacement character (it represents: ‘�’),

If the file is open in writing mode mode=w or mode=a you can the errors option can also have the following additional two values:

  1. 'xmlcharrefreplace' use XML’s character references

  2. 'backslashreplace' replace the unicode character with its code

    point (as used by Python)

Let us use the /tmp/python_intro.txt file that we wrote before.

>>> import codecs
>>> file_object = codecs.open('/tmp/python_intro.txt', 'r', 'ascii', 'strict')
>>> file_object.read()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 32: ordinal not in range(128)

>>> file_object.close()
>>> file_object =  codecs.open('/tmp/python_intro.txt', 'r', 'ascii', 'replace')
>>> text = file_object.read()
>>> file_object.close()
>>> type(text)
<type 'unicode'>
>>> print(text)
The Plato's Academy motto was:
'������������������������� ������������� �������������'

>>> file_object = codecs.open('/tmp/python_intro.txt', 'r', 'ascii', 'ignore')
>>> print(file_object.read())
The Plato's Academy motto was:
'  '

>>> file_object.close()

And the two additional error handling methods when writing to files works in the following way:

>>> file_object = codecs.open('/tmp/python_intro.txt', 'r', 'utf8')
>>> text = file_object.read()
>>> file_object.close()
>>> file_object = codecs.open('/tmp/python_intro_xml.txt', 'w', 'ascii', 'xmlcharrefreplace')
>>> file_object.write(text)
>>> file_object.close()
>>> file_object = codecs.open('/tmp/python_intro_xml.txt', 'r', 'ascii')
>>> print(file_object.read())
The Plato's Academy motto was:
'&#7936;&#947;&#949;&#969;&#956;&#941;&#964;&#961;&#951;&#964;&#959;&#962; &#956;&#951;&#948;&#949;&#8054;&#962; &#949;&#7984;&#963;&#943;&#964;&#969;'

>>> file_object.close()

Writing with errors='backslashreplace will give:

The Plato's Academy motto was:
'\u1f00\u03b3\u03b5\u03c9\u03bc\u03ad\u03c4\u03c1\u03b7\u03c4\u03bf\u03c2 \u03bc\u03b7\u03b4\u03b5\u1f76\u03c2 \u03b5\u1f30\u03c3\u03af\u03c4\u03c9'

Let us note that the errors parameter is also used in unicode.encode() (with values 1 to 5) and str.decode() (with values 1 to 3). It is also possible to define how to handle unicode errors with codecs.register_error().


Dictionaries are mapping types. They map a set of keys to a set of values. A map has to be well defined, i.e. there can be only one value corresponding to a given key. This brings some restriction one the objects that can serve as keys of a dictionary. We begin with simple examples:

>>> alphabet_dict={'a' : 1, 'b' : 2, 'c' : 3}
>>> type(alphabet_dict)
<type 'dict'>
>>> # Get a value corresponding to a given key
>>> alphabet_dict['a']
>>> # Assign value to a key
>>> alphabet_dict['d'] = 4
>>> alphabet_dict
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> # Return a list of keys
>>> alphabet_dict.keys()
dict_keys(['a', 'b', 'c', 'd'])
>>> # Return a list of values:
>>> alphabet_dict.values()
dic_values([1, 2, 3, 4])
>>> # List ditonary items:
>>> alphabet_dict.items()
dic_items([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
>>> # Delete a key, value pair
>>> del alphabet_dict['d']
>>> alphabet_dict
{'a': 1, 'b': 2, 'c': 3}


As with lists, the keword dict is not reserved in Python, but if you will define a variable dict you will not be able to refer to the built-in dict() function.

To iterate over keys of a dictionary you can use the keys() method as shown above, but also you can just write:

>>> for key in alphabet_dict.keys():
...  print(key, end=' ')
a c b

The equivalent Python 2.7 statement is:

>>> for key in alphabet_dic.iterkeys(): print key,
a c b

Though Python 2.7 also has the keys() method. It returns a list while iterkeys() and the Python3 keys() method returns an iterator: the difference is that iterators do not load all the elements into memory, one one key per iteration is loaded.

If you try to get a value of a key which is not present in the dictionary you a KeyError exception is raised:

>>> alphabet_dict['e']
KeyError: 'e'

You can, however, specify a default value if the key is not present using the dict.get() method:

>>> alphabet_dict.get('e', 5)
>>> alphabet_dict.get('a', 0)

You can also merge to dictionaries with the dict.update() method:

>>> K = { 'source.py' : 250, 'index.py' : 412 }
>>> L = { 'README.txt' : 100, 'INSTALL.txt' : 30 }
>>> K.update(L)
>>> K
{'source.py': 250, 'index.py': 412, 'README.txt': 100, 'INSTALL.txt' : 30}

But this method will also update the values of K with the values of L:

>>> K = { 'source.py' : 250, 'index.py' : 412, 'README.txt' : 0 }
>>> K.update(L)
>>> K['README.txt']

There is also built-in function dict() (actually it is not a function but a type ...) which produces a dictionary. It is quite convenient if you have two lists which you want to zip into a dictionary:

>>> dict(gibons=200, parrots=150)
{'gibbons': 200, 'parrots': 150}
>>> dict({'parrots': 150, 'gibons': 200})
{'parrots': 150, 'gibbons': 200}
>>> dict([('gibbons', 200), ('parrots', 150)])
{'parrots': 150, 'gibbons': 200}
>>> keys = ['elephants', 'zebras', 'lions']
>>> values = [50, 300, 10]
>>> dict(zip(keys, values))
{'lions': 10, 'zebras': 300, 'elephants': 50}

and the last convienient way of forming a dicitionary:

>>> dict.fromkeys(('a', 'b'), 1)
{'a': 1, 'b': 1}

The zip() function produces list of tuples zipped from the elements of supplied list. There might be more than two list given to zip(). The dictionary type does not take care of the order of keys. If this was relevant we could use the collections.OrderedDict from the collections module.

What kind of object cannot serve as keys? For example mutable objects like lists, since they might change in their lifetime. In general any object that is hashable can be a key. Hashable objects have a __hash__() method which computes the hash value. This value should not change during the object’s life. Here is a very nice explanation why hashable objects are needed and why list class cannot be used as dictionary key while tuple object can.

See also

If you have read the wiki article about how hash function is used to implement dictionaries you can also watch an excellent PyCon2010 lecture. It will explain you why you cannot insert new keys to dictionaries when you iterate over them.

Dictionary comprehension

It is very similar to list comprehension. I think you will grasp the idea after a few examples:

>>> {a:a for a in range(3)}
{0: 0, 1: 1, 2: 2}
>>> l=(('a',1), ('b',2), ('c', 3))
>>> {key: value for (key, value) in l}
{'a': 1, 'c': 3, 'b': 2}

As with lists you can applay some functions to the keys and values before they will be assigned to the dictionary:

>>> {'-%s-' % key: value**2 for (key, value) in l}
{'-b-': 4, '-c-': 9, '-a-': 1}


>>> {chr(97+i): i for i in range(5)}
{'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}

Dicionary comprehension appeared in Python2.7, in earlier versions of Python you will have to use the dict() constructor function which, as you have seen, accepts iterable objects of pairs which are then translated as keys and values.

Shallow and deep copies of dictionaries

As lists, dictionaries are mutable objects: you can add new key, value pairs, or delete old ones:

>>> M=K
>>> M['todo.txt']=420
>>> 'todo.txt' in K

As you can see the in can be used to test if a key is in a dictionary or not. The behaviour in above snippet becomes clear, when you think in Python way: M and K are labels for the same object, so if we add an entry to the dictionary, it doesn’t matter which label we are using. Some times this is not what you would like to. For example you need to remember the state of the dictionary at a particular place in your code. The solution is to make a shallow or deep copy of the dictionary. As with lists the copy module functions copy.copy() and copy.deepcopy() will do the job. Let us see what is the difference between shallow and deep copy:

>>> from copy import *
>>> L=[]
>>> a = {'a': 1, 'b': 2, 'c' :L}
>>> # Make a shallow copy:
>>> a_sc = copy(a)
>>> # Make a deep copy:
>>> a_dc = deepcopy(a)
>>> L.append(0)
>>> a
{'a': 1, 'b': 2, 'c' :[0]}
>>> a_sc
{'a': 1, 'b': 2, 'c' :[0]}
>>> a_dc
{'a': 1, 'b': 2, 'c' :[]}

or another example:

>>> a = {}
>>> a['a']=a
>>> a
{'a': {...}}
>>> a['a'] is a
>>> a_sc = copy(a)
>>> a_dc = deepcopy(a)
>>> a['a']['x']=1
>>> a
{'a': {...}, 'x': 1}
>>> a['a']
{'a': {...}, 'x': 1}
>>> a_sc
{'a': {'a': {...}, 'x': 1}}
>>> # Check that the shallow copy is indeed shallow:
>>> a is a_sc['a']
>>> a is a_sc # but it is still a copy:
>>> a_dc
{'a': {...}}
>>> a_dc['a']
{'a': {...}}

What we did in this example is quite unusual. We added the dictionary a as a its own entry (under the key 'a'). Note that the shallow copy a_sc doesn’t contain the key x (line 15). The lines 17-18 shows that the copy.copy() did not make a copy of the object a['a'] (which was a itself). Hence a_sc does not contain itself as an entry, but it still contains a (which contains a and 'x', ...). Adding x` to the dictionary doesn’t add it to the shallow copy, but you will find it here:

>>> a_sc['a']
{'a': {...}, 'x': 1}

since a_sc['a'] is the same as a. The copy.deepcopy() function can deal with circular dependencies without falling into an infinite loop (it is just tracking the objects that it had already copied). An here neither a_dc is a nor a_dc['a'] is a nor a_dc['a']['a'] is ... However the deep copy is clever enough to keep the relation:

>>> a_dc is a_dc['a']
>>> a_dc['a'] is a_dc['a']['a']

and so on (logically the second True follows from the first one).

What is important to remember is that the shallow copy makes a copy of the object, but doesn’t descends into the elements that it contains, while the deep copy makes a copy of the objects, its elements, and elements of its elements ... It copies each object only once that is why it keeps the relation a_dc is a_dc['a'] and also this prevents it from falling into infinite loops.

Other resources

See also

Python tutorial: introduction: this is a part of Python tutorial, which corresponds to above material. You can also see Python tutorial: data structures. This part is particularly useful when you start working with list and dictionaries. You can keep it in your bookmarks - if you are just beginning for sure you will look for it.

Built-in Types: this Python doc extends what we have just learned here, describes few other types like set (a mutable class) and frozenset (immutable and hashable).

Last update: 2013-11-06 11:02 (CET)
Contact us.
Created using , and
we are not associated with Python Software Foundation
© Copyright 2012, 2013 Accorda Institute.