Print
Python in a Nutshell

What's New in Python 2.3?

by Alex Martelli , author of Python in a Nutshell
03/27/2003

Editor's note: When Alex Martelli said he wanted to write an article about what he was "unable to include in Python in a Nutshell," my first thought was "why would we want to tell readers what they will not find when they purchase this book?" But our In a Nutshell books do not purport to include everything about a subject. As Tim O'Reilly says: "These books aren't tutorials. They take a topic and drill down, expand, and, we hope, delight the reader by providing useful information the reader didn't even expect to find." If you want to find out more about all of our In a Nutshell books, check out our recently launched nutshells.oreilly.com site. Meanwhile, read on to find out why Alex says his recent In a Nutshell book is eminently relevant as you upgrade to Python 2.3.

Introduction

Python in a Nutshell comes with a banner on the cover that says it "Covers Python 2.2." With Python 2.3 coming soon (version 2.3 is currently in the "alpha" phase; "beta" will soon follow; and then, in due course, there will be release candidates; then a final release), you might justifiably worry that the forthcoming Python 2.3 is going to invalidate what you learned from Python in a Nutshell. Is it worth upgrading, or should you stick to 2.2 as long as possible? This article answers those questions with a look at the changes and improvements to the new version, including reviews of the new modules 2.3 has to offer.

Upgrading to 2.3

Good news: Python is a stable language. New releases are always designed to avoid breaking good Python code that worked with previous releases. So you can keep programming to Python 2.2, upgrade your installed Python to Python 2.3, and count on your code still working correctly. Python in a Nutshell will be eminently applicable regardless of which version you use.

Is it worth upgrading? You bet. With Python 2.3, you can expect typical Python code to run about 15 percent to 20 percent faster than it did with 2.2, since a lot of care has been devoted to optimization and fine-tuning.

Perfomance Improvements and New Modules

Even if you don't use the language and library improvements in Python 2.3, the speed gains alone make it worthwhile to upgrade. In some cases, the gains are even more impressive, and the new timeit.py module makes them easy to measure. For example, multiplication of long integers uses a new, much faster algorithm ("Karatsuba multiplication"):

[alex@lancelot src]$ python2.2 -O /usr/local/lib/python2.3/timeit.py \
> '112233445566778899 * 112233445566778899'
1000000 loops, time: 1.151 usec
[alex@lancelot src]$ python2.3 -O /usr/local/lib/python2.3/timeit.py \
> '112233445566778899 * 112233445566778899'
1000000 loops, time: 0.665 usec

For this case, we see that the speedup is over 70 percent (let us note, in passing, that multiplying a long integer by itself is the fastest way of squaring it: 112233445566778899 ** 2, on the same machine, takes 2.727 microseconds in Python 2.2, 1.349 in Python 2.3; so, when you need to square a long integer, remember that multiplying it by itself is over twice as fast as raising it to the power of two).

Enhancements to the Python language itself, from 2.2 to 2.3, are minor but helpful. As mentioned in the Nutshell, slicing of built-in sequences now supports an optional third parameter, the stride of the slice. For example, to get alternate characters from a string, in either normal or reverse order, you can now just slice the string:

>>> print 'arrivederci'[::2]
arvdri
>>> print 'arrivederci'[::-2]
irdvra
>>>

Booleans' string representations are now the strings 'True' and 'False', and the in operator on strings has been made more powerful:

[alex@lancelot src]$ python2.2 -c 'print "r" in "arrivederci"'
1
[alex@lancelot src]$ python2.3 -c 'print "r" in "arrivederci"'
True

[alex@lancelot src]$ python2.2 -c 'print "riv" in "arrivederci"'
Traceback (most recent call last):
  File "<string>", line 1, in ?
TypeError: 'in <string>' requires character as left operand
[alex@lancelot src]$ python2.3 -c 'print "riv" in "arrivederci"'
True

As you see, we can now check if any substring is "in" a given string: the check is not limited any more to being done on a single character, as it was up to Python 2.2.

Built-in types have gained a few more small extras. You can now open a text file with mode U, for "universal readlines", to read it with transparent support for all common kinds of line terminators: '\r', '\n', and '\r\n' all translate into '\n' in this mode. Dictionaries are a bit richer, with two more ways to build them:

>>> x=dict(a=23,b=45,c=67,d=89)
>>> x
{'a': 23, 'c': 67, 'b': 45, 'd': 89}
>>> y=dict.fromkeys(range(4), 'ho')
>>> y
{0: 'ho', 1: 'ho', 2: 'ho', 3: 'ho'}

and a new method to fetch-and-remove an item, by key:

>>> x.pop('c')
67
>>> x
{'a': 23, 'b': 45, 'd': 89}

The new pop method of dicts takes the key as its argument, returns the corresponding value, and removes the item from the dict, quite similarly to the pop method that lists have long had. A dict's pop method lets you treat missing keys in either of two ways:

>>> x.pop('z')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
KeyError: 'z'
>>> x.pop('z', 55)
55

When you call pop with a single argument, and that key is not in the dict, KeyError gets raised; alternatively, you can call pop with two arguments, and so provide a default value for the method to return if the key isn't present.

A useful new built-in function is enumerate, which lets you loop in parallel over a sequence and its indices:

>>> for x in enumerate('ciao'): print x
...
(0, 'c')
(1, 'i')
(2, 'a')
(3, 'o')
>>>

You can easily emulate this in Python 2.2, if you want to make your programs more easily portable between 2.2 and 2.3:

from __future__ import generators
def enumerate(sequence):
    index = 0
    for item in sequence:
        yield index, item
        index += 1

In Python 2.3 you would not need the from __future__ import any longer for this purpose (generators are always enabled, and yield is always a keyword), but, since this is Python 2.2 code intended to emulate the new 2.3 built-in, of course, we do need to "import generators from the future".

Related Reading

Python in a Nutshell
By Alex Martelli

As is typical of all Python upgrades, most enhancements in Python 2.3 do not come as changes to the Python language itself, but rather can be found in Python's vast standard library. In many cases, this means you can take the Python sources of a new 2.3 library module... and sneak it into a 2.2 installation that you cannot entirely upgrade for whatever reason--this will not always work, as the new module may take advantage of language innovations; but often it will, and if you find yourself in such a situation it may be worth a try.

Library enhancements can be generally divided into improvements to existing modules, and entirely new modules. However, in the specific case of Python 2.3, one important enhancement is the removal of two modules: rexec and Bastion, which are discussed in the "Restricted Execution" section of the Nutshell's Chapter 13 ("Controlling Execution"). It has been discovered that these modules present unfixable and exploitable security flaws, and therefore they have been officially declared "dead," with immediate effect and without the usual backwards-compatibility precautions. Security weaknesses do require such immediate and drastic action.

Research is ongoing on alternative ways to let your Python applications execute "untrusted" Python code in safe ways; for example, I recommend taking a look at the experimental Sandbox.py module that you can find at www.procoders.net/download.php?fname=SandBox.py. However, until such alternatives have been thoroughly examined by security experts, and released as approved and secure, I recommend you do not yet rely on them for production work that does require high security.

Some of the enhancements to existing modules were known early enough that I was able to mention them in Python in a Nutshell; for example, all sockets from standard module socket can now optionally display timeout behavior. Other enhancements are nearly "transparent" to your application code. For example, the module random uses a new, random number generator (the "Mersenne Twister") with a longer period; the pickle module can use a new and more efficient pickling protocol; and bsddb supports newer versions of the underlying Sleepycat Berkeley DB library.

Python 2.3's standard library also comes with many new modules. Some are analogous to existing ones, but are better: for example, bz2 lets your application use the bzip2 compression library, which can compress data better than gzip; optparse lets you parse command-line options, like getopt but with more power; textwrap reformats text into paragraphs, as you could previously do with some of the classes supplied by module formatter, but in simpler and more flexible ways.

Other new modules offer completely new functionality. The datetime module offers quick date and time calculations; to compute, for example, the number of days between two dates, you can now use very simple code:

>>> from datetime import date
>>> print date(2003,3,23)-date(2002,10,1)
173 days, 0:00:00

The heapq module implements functions that let you use a list as a heap-queue (also known as a priority queue). This new module doesn't directly implement a priority queue class, but it does make it trivial to build one yourself, depending on the exact details of your application's needs. For example, you could code:

from heapq import heappush, heappop

class PriorityQueue(object):

    def __init__(self):
        self.q = []

    def __len__(self):
        return len(self.q)

    def arrival(self, cost, item):
        heappush(self.q, (cost, item))

    def departure(self):
        cost, item = heappop(self.q)
        return item

Here, each "arriving" item with a given cost is added to a PriorityQueue instance pq by calling pg.arrival(cost, item); at any time, provided pq is non-empty, the "best" (cheapest) item that is still present in the queue can be obtained (and removed) by calling pq.departure().

The itertools module implements simple and fast "building blocks" to build, modify, and combine iterators, letting you construct flexible and memory-efficient loops in very simple ways. For example, yet another way to simulate the new enumerate built-in function would be:

import itertools
def enumerate(sequence):
    return itertools.izip(itertools.icount(), sequence)

The logging module implements a complete, powerful, and flexible system for logging error and warning messages. Despite the logging system's richness, you can use it quite simply, as in the following snippet:

import logging
   ...
if username not in known_users:
    logging.warning("User %s not known", username)

This code can just ask the system to "log a warning-level message", and leave it up to the system's runtime configuration to determine where (and if) a message of such a level will be stored and/or displayed.

The sets module offers a new datatype corresponding to the mathematical concept of "set". For example, given two strings, a simple and straightforward way to get a string that is made up of all characters present in both (such as a "set intersection" of strings) is now:

>>> import sets
>>> ''.join( sets.Set('ciao there!') & sets.Set('hello!') )
'!heo'
>>>

The resulting order of the characters is arbitrary: sets, like dictionaries, do not even have a concept of "order" in their items.

The zipimport module lets you import modules from .zip files (without having to unzip such files first); zipimport is now automatically used by the import statement if you just place a .zip file into your modules-import path.

Altogether, the rich crop of new modules let you build Python programs with more productivity and ease than before--and your Python programs can be faster to run, and simpler and faster for you to write. Thus, I recommend the upgrade, without reservations.

Alex Martelli currently works for AB Strakt, a Python-centered software house in Göteborg, Sweden, mostly by telecommuting from his home in Bologna, Italy.


O'Reilly & Associates recently released (March 2003) Python in a Nutshell.


Return to Python DevCenter.