Its time for my favorite conference of the year! ETech kicked off this morning to some awesome keynotes and moved straight into some killer sessions. The first session I attended was Peter Norvig’s “How billions of examples lead to better models of images and text” presentation.
Before I tell you about what Peter shared with us, its important to mention that Peter works for Google and thus has access to massive amounts of data and images that most of us can’t even conceive. And having access to these vast data stores is the premise for his presentation.
Peter started off with a review of the classic scientific model used to understand something that isn’t yet understood. First you observe the system in order to build a model and then you use the model to attempt to describe what the system will do outside of your observation window. This process is tedious and most of the time the generated models are wrong. By having access of millions of examples of data, you can learn from the data and avoid creating faulty models.
His first set of examples focused on images — starting with a quick overview of the history of images. Peter pointed out that we started with paintings in caves as the first images; from there we moved on to photography and then finally on to making moving images. Each of these advances over cave paintings came about by and advance in technology and improvements in the data rate that our tools can handle.
Peter gave the concept of Scene Completion as an example for working with images where having access to more images allows an algorithm to function. Scene completion allows an artist to take a picture and remove an undesired element from the picture (anything that distracts from illustrating the point of the image) and have an algorithm replace the removed portions with appropriate portions from a database of images. In his example he showed the removal of a rooftop in the foreground that was obstructing the scenic view of a bay and showed how the algorithm replaced the roof with boats on the water in the bay. His point here was the that algorithm didn’t work when the image database consisted of only 10,000 images. But as soon as a database of 1M+ images was used, the algorithm improved dramatically. Peter mentioned that this example clearly shows how the focus in computer science shifts from computing power to data and extracting as much utility from a vast body of data as possible.
Finding the canonical images for a concept was probably the most interesting concept that Peter showed in his presentation. Given an image search for “Mona Lisa” you’ll get all sorts of image results that relate to the Mona Lisa. You’ll also get a number of spoof images or other images that incorporate the use of the Mona Lisa. But which image is the canonical image of the Mona Lisa? Google’s approach is to analyze all the images thought to relate to a topic (as based on web crawling results and context from those pages) and then to extract the basic image features of these images. Then, using a “pagerank like” algorithm these image features can be compared to features from other images. From this process the “canonical image” arises from vast quantities of images. It won’t work if you have a few images to throw at the algorithm — you’ll need vast quantities of images for the algorithm to work.
Peter went on to talk about text models and how large repositories of text can help us build tools that enable more sophisticated tools that are not possible otherwise. One example he outlined is text segmentation — imagine a sentence where all the spaces between words have been removed (e.g. nowisthetimeforallgoodmen…) How can you programatically figure out where the spaces should go? Google built a probabilistic model that determines the likeliness of spaces being placed after characters. This model was then trained on 1.7 billion words of english text with an accuracy of 98%!
Another text example that Peter gave focused on fixing spelling mistakes — most dictionary based spell checkers have flaws that they cannot recognize any words that are not in the dictionary — like most non-english names. However, using a large corpus of data Google can build a much better spell checker since its seen all many examples and from that tell which words to highlight for the user for review.
Peter gave a number of other examples (e.g. Google Sets) and outlined some of the techniques used for instrumenting machine learning. His presentation made it clear to me that a whole new class of algorithms is emerging that focus on processing vast quantities of data. And we’re not talking about small data sets either — the scene completion example shows that thousands of images are not enough and that millions of images are needed to make the algorithm work.
Computer science has focused on processor speed for so many years that once we got close to repealing Moore’s Law, I started wondering what would be in store for computer science. Multiple cores in processors show that we can continue to expand our computing power, but the real future advances in computer science lie elsewhere. Advances will come from using vast quantities of data to create more sophisticated programs that give us abilities that we didn’t have yesterday. This of course makes me happy since I’ve believed in the power of large data sets (especially open datasets) for quite some time.
Its also been clear to me that more data can be derived from data — metadata, of course. Its also good to see that concrete technology, like new algorithms, can arise from data as well. Rather than closing off areas of computer science, I am glad to see that new ones open up the time.
And finding those new areas is what excites me about the Emerging Technology Conference!






