Program Your Computer to See - Computer Vision is the science of reading images to extract meaningful data. Current CV uses include OCR, handwriting interpretation, gesture recognition, face tracking, and security. On Sept. 13th, Intel made its Open Source Computer Vision Library available for the Linux platform -- a move that should accelerate CV development. Chris Halsall introduces you to CV and its applications.
To quote Arthur C. Clarke, "Any sufficiently advanced technology will be indistinguishable from magic." A common geek twist on the quote is, "Any technology distinguishable from magic is insufficiently advanced."
To anyone who's not spent the last several years in graduate school, watching the Open Source Computer Vision Library (OSCVL) calibrate a camera just by waving a chessboard pattern in front of it may seem somewhat magical. But, as magic has no place in software development, the process must be logical. Since we have the code, of course, we can actually walk through the process step by step, see exactly what it is doing, and learn along the way.
If you haven't downloaded and installed the OSCVL packages yet, go to the OSCVL site and do so. Then you'll need to grab and build the Linux version of the camera calibration tool. Now you're ready to calibrate any camera connected to a Video4Linux device.
The camera calibration process consists of a data acquisition phase, in order to collect several frames of point samples for the target, and then a calculation phase in which the collected data is analyzed to calculate the camera's intrinsics. Once the intrinsics have been found (or loaded from disk), images can be undistorted in real time.
Calibration software running under Linux; Tux is helping hold target.
All of these functions are supported directly by the OSCVL, with the application only needing to pass the data expected to four different library functions. These functions, of course, leverage on many of the other function groups within the library in order to perform these operations, but this is hidden from the casual application developer.
For the data collection phase, the application program simply
requests a frame of video from the video device, and then passes this to the OSCVL function
cvFindChessBoardCornerGuesses(), which looks
for the target pattern. The function is also passed the number of points to look for, and an array into which any found points' coordinates are to be stored.
Dilation: for the first pass, squares are split before being searched for.
The way the function finds the intersections is a bit indirect;
it actually finds the squares, and calculates the intersections from
these. To make this easier for the contour-finding functions provided
by the library, one of the first things
cvFindChessBoardCornerGuesses() does is split the squares
apart using a "dilation" function, one of a common set of binary
morphology tools used in Computer Vision (CV).
The image on the right shows an extreme example of results of this function. The actual library function only requests that a single iteration be done, so the squares are only slightly broken apart; the image shown here is after five loops. Notice that the squares maintain their shape after this operation; they're smaller, but they're still squares.
The image is then passed through a "threshold" function, which converts it into a simple black-and-white, single-bit plane image. The data is then ready to be processed by the "contour processing" functions of the library. These are used to find the outlines around connected blocks of pixels. Two common applications are to find areas of interest in an image like the quadrangles (projected squares) we're looking for here, as well as optical character recognition (OCR).
cvFindChessBoardCornerGuesses() function simply asks for the list of
all the contours approximated as a polygon (using the Douglas-Peucker method), and throws away anything that doesn't have four vertices, is too small, or just doesn't look like a projected square. Anything that does look
square enough is added to a list.
For the rest of this function, it's just work; the "art" is complete. The function loops through all the found squares and matches up all their intersecting corners. The list of intersecting corners is sorted and placed into the passed storage area. If it looks like a good set of intersections were found, the number is returned as a postive value. Otherwise, the number of points found is returned as negative.
cvFindChessBoardCornerGuesses() function is able to
find the corner intersection points to an accuracy of half a pixel.
If the function returns a positive value, further refinement of the
intersection corner locations is done within the image using a different
technique encoded in the
cvFindCornerSubPix() call. This is able to do sub-pixel location of the intersections to an accuracy of 0.1 pixel.
cvFindCornerSubPix() function works on the original, gray-scale
image, and the initial guesses of the intersection pointers. For
each point, the function iterates until the image gradient at the
sub-pixel intersection location is below a set threshold. This can
(very roughly) be thought of as the inverse of interpolation filtering
of OpenGL textures.
For the application program, calculating the camera intrinsics
is as simple as calling
the collected data as parameters. It then goes away and thinks
for a while, returning the intrinsic data if it was passed enough
good quality data. Otherwise, the returned values are set to NAN,
and another set of data must be acquired.
The exact methods used to do these calculations are beyond the scope of this article, and involve some rather high-level math most people are not generally trained in. There is a graduate level paper included on the OSCVL site which details what is being done, and of course, the source is always there.
Once camera intrinsics are known, distortion can be removed.
Once the camera intrinsics are known, it's a fairly simple matter to iterate across an image array, calculating the corresponding distorted pixel location, and copying the pixel over. To avoid the "jaggies," interpolation filtering can be used, although it's slower to process.
This is exactly what
cvUnDistortOnce() does. The function
call simply requires the image to be undistorted and the intrinsics.
This can be done in real time to a video feed or post-processed for
Note also that the opposite operation is possible as well -- given the distorted position, calculate the undistorted, or "registered," coordinates. With one camera, a point in an image will lie along a known 3D vector passing though the center of the lens. With two or more cameras, the intersection of these vectors can be used to determine a point of interest's (X,Y,Z) location.
We have only scratched the surface of what is possible with the OSCVL, although, since calibrated cameras are critical for a great many CV applications, it seemed like a good place to start. By leveraging on the library, developers can quickly build working applications using the most common functions and algorithms used by researchers in the field.
Because it is open source, anyone who doesn't understand how something works can walk through the library using a debugger and/or by exposing image data as it passes through various operations within library calls.
None of it is magic; it just seems that way at first.
Chris Halsall is the Managing Director of Ideas 4 Lease (Barbados). Chris is a specialist... at automating information gathering and presentation systems.
Discuss this article in the O'Reilly Network Linux Forum.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.