Understanding Network I/O, Part 2by George Belotsky
With today's technology, creating your own Internet services can be a relatively easy, one-person project. You may not produce the next Google, but helping your business, not-for-profit organization, school, or friends with a useful Internet application is thoroughly feasible — even on a part-time basis.
In fact, simple Internet clients should take less than a day to write, as described in the previous article on Python network programming. In this second article of a two-part series, we discuss more advanced networking topics, including a set of guidelines for choosing the most suitable approach for your situation.
As in the first article, examples are provided in the Python programming language. Python's clean, elegant syntax is highly suitable for creating compact, easy-to-understand programs. If you require more information about Python, the previous article includes many Python-related links and an installation notes section. Reading the first article will also help you understand the material presented here, but it is not strictly necessary.
Doing Several Things at Once
As discussed in the previous article, network I/O is unpredictable. Sometimes requests will fail outright for some (possibly transient) reason, but often the fault is much more subtle. For example, data might start flowing only after a lengthy delay (high latency), or flow very slowly (low bandwidth). On the Internet, such faults may be caused by a malfunctioning system on another continent — far out of the reach of your application.
This unpredictability presents a challenge if your program must perform multiple network operations. For example, servers typically process requests from many clients. It is rarely acceptable to make everyone wait just because one client has trouble. Fortunately, there are powerful, well-tested techniques to deal with such situations.
The key to all of these techniques is that several network I/O operations can be performed concurrently. Thus, we continue to process most requests quickly, even if some requests are delayed due to network-related problems.
Today, there are two basic strategies for concurrency; multitasking and asynchronous I/O. Both techniques are widely applicable to servers, clients, and peer-to-peer systems. To help get you started quickly, the examples presented here extend the ones given in the first article. First, however, we will briefly cover the most basic, fundamental approach: synchronous I/O.
Synchronous I/O is the simplest method for your networked application. Basic synchronous I/O provides no concurrency at all; the program stops at each operation, waiting for it to complete. This technique is sufficient for simple clients. All of the examples in the previous article, except for the one based on Twisted, used synchronous I/O.
Synchronous I/O is easy to test. Other methods introduce many complex subtleties, so initial verification of an application's logic benefits from the use of synchronous I/O. The following program implements a web client that fetches the current outdoor temperature in New York, London, and Tokyo. It is a straightforward modification of example 8 in the previous article.
Example 1. A synchronous I/O client
import urllib # Library for retrieving files using a URL. import re # Library for finding patterns in text. import sys # Library for system-specific functionality. # Three NOAA web pages, showing current conditions in New York, # London and Tokyo, respectively. citydata = (('New York','http://weather.noaa.gov/weather/current/KNYC.html'), ('London', 'http://weather.noaa.gov/weather/current/EGLC.html'), ('Tokyo', 'http://weather.noaa.gov/weather/current/RJTT.html')) # The maximum amount of data we are prepared to read, from any single page. MAX_PAGE_LEN = 20000 for name,url in citydata: # Open and read each web page; catch any I/O errors. try: webpage = urllib.urlopen(url).read(MAX_PAGE_LEN) except IOError, e: # An I/O error occurred; print the error message and exit. print 'I/O Error when reading URL',url,':\n',e.strerror sys.exit() # Pattern which matches text like '66.9 F'. The last # argument ('re.S') is a flag, which effectively causes # newlines to be treated as ordinary characters. match = re.search(r'(-?\d+(?:\.\d+)?) F',webpage,re.S) # Print out the matched text and a descriptive message; # if there is no match, print an error message. if match == None: print 'No temperature reading at URL:',url else: print 'In '+name+', it is now',match.group(1),'degrees.'
Here is the output produced by the client.
Example 2. Synchronous I/O client output
In New York, it is now 37.9 degrees. In London, it is now 46 degrees. In Tokyo, it is now 48 degrees.