Pete Warden

Areas of Expertise:

  • Visualization
  • Hadoop
  • Big Data
Pete Warden is the founder of the OpenHeatMap project, writer of the Data Source Handbook for O'Reilly, a regular contributor to ReadWriteWeb, and a consultant to the New York Times. With 14 years experience building large-scale data processing solutions, including five as a senior engineer at Apple, Pete has been on the frontlines of Big Data, using, writing about, and contributing code to tools like Redis, MongoDB and Hadoop. He believes these services radically change what's possible, and speaks to audiences around the country about how they can do amazing things with their own data.

Big Data Glossary Big Data Glossary
by Pete Warden
September 2011
Print: $19.99
Ebook: $14.99

Data Source Handbook Data Source Handbook
by Pete Warden
February 2011
Print: $29.99
Ebook: $14.99

An Introduction to MapReduce with Pete Warden An Introduction to MapReduce with Pete Warden
by Pete Warden
June 2011
Video: $19.99

Visualizing Shared, Distributed Data Visualizing Shared, Distributed Data
by Roman Stanek , Pete Warden
March 2011
OUT OF PRINT

Recent Posts | All O'Reilly Posts

Pete blogs at:




Five short links

May 25 2013

Photo by Tony Preece CLAVIN - A very promising open source geotagging project that analyzes unstructured text and identifies geographic entities. It has some very neat tricks up its sleeve to disambiguate common names like 'Springfield' based on the context.... read more

Five short links

May 21 2013

Photo by Eldeeem The Cartography of Bullshit - A righteous rant against a piece of pop-sociology digging into just how flimsy the underlying statistics are. It hits home because numbers I've mined have ended up in similar columns - a... read more

No more heatmaps that are just population maps!

May 20 2013

I'm pleased to announce that there's a brand new 0.50 version of the DSTK out! It has a lot of bug fixes, and a couple of major new features, and you can get it on Amazon's EC2 as ami-7b9df412, download... read more

Five short links

May 12 2013

Photo by Curtis Perry The Declassification Engine - "Saving history from official secrecy". A fascinating concept that shows how the firehose of cheap distributed computing power fundamentally changes what privacy and secrecy mean. We can probably reconstruct a lot of... read more

Five short links

May 08 2013

Photo by Grant Hutchinson Assuming everybody else sucked - If an industry is behaving in an apparently irrational way, try to figure out the internal logic that's driving that behavior. You'll be much more effective at breaking the rules if... read more

We're all starting to track ourselves

May 07 2013

We're releasing a massive and growing amount of information about who we are, where we go, and when. There are hundreds of millions of public checkins already out there, and millions more are being created every day. People think of... read more

Five short links

May 03 2013

Photo by Neil Platform1 GeoURI - I have no earthly use for these, but I love that they exist, and are even an IETF standard! Nathaniel Bowditch - He created the American Practical Navigator over two hundred years ago. He... read more

Open Sentiment Analysis

May 01 2013

Photo by Courtney Carmody Sentiment analysis is fiendishly hard to solve well, but easy to solve to a first approximation. I've been frustrated that there have been no easy free libraries that make the technology available to non-specialists like me.... read more

Five short links

April 26 2013

A Global Poverty Map Derived from Satellite Data - This is an old paper from 2006, but I love the idea of using how much light that a neighborhood sends into to the night sky to measure how wealthy it... read more

Five short links

April 23 2013

Photo by Tasty Goodness Yoyodyne - How a fictional company was born in the novels of Thomas Pynchon, was adopted by Buckaroo Banzai and Star Trek, and ended up in the GPL. What will be left of our cities? -... read more

Do we need a slow software movement?

April 19 2013

Photo by Tim Regan When I was an isolated kid in the English countryside my only connections to the computing world were "Public Domain" floppy disks. Mail-order libraries would send me one of the disks in their catalog if I... read more

Five short links

April 18 2013

Photo by Kurtis Garbutt Geo-location estimation of Flickr images - The caption, title, and description of a photo is incredibly useful when it comes to guessing where a photo was taken, even using fairly crude language analysis algorithms. This is... read more

Converting to and from Google map tile coordinates in PostGIS

April 10 2013

Google Maps' system of power-of-two tiles has become a defacto standard, widely used by all sorts of web mapping software. I've found it handy to use as a caching scheme for our data, but the PostGIS calls to use it... read more

Five short links

April 09 2013

Photo by Alan Levine Elephant - A beautiful open source project to store data in a way that's "as durable as S3, as portable as JSON, and as queryable as HTTP". Tim O'Reilly has talked about the web operating system,... read more

The Chairs and The Shrew

March 20 2013

Photo by Jesse Bell I have middlebrow tendencies, but over the years I've learned that the struggle with difficult work can pay off. I grew to love Infinite Jest, once I figured out Wallace was boring me deliberately, that he... read more

Five short links

March 19 2013

Photo by Earl Want to live somewhere nice? Be prepared to work longer - How an area's living costs affect poor and rich workers differently. Moving towards an identity and patient records locator - As Ben Adida points out, a... read more

Quantity has a quality all its own

March 18 2013

Photo by Kevin Collins I used to be an image processing engineer. I'd be handed a picture, and I'd have to do something useful with it. To do that I had to take a big mental leap. Instead of seeing... read more

Five short links

March 13 2013

Photo by Yersinia The Deleted City - A spatial reinterpretation of the old Geocities sites. Having data in a single large dump instead of behind an API makes it possible to do things like this with it, things that the... read more

Five short links

March 12 2013

Photo by Flood G BetaShapes - Using geotagged Flickr photos to define San Franciscos neighborhoods as a crowd-sourced 'folksonomy'. I'm entranced by how many useful things emerge from the clouds of data exhaust we're all generating. Bacteria farming and software... read more

Why I'm a terrible privacy advocate

March 12 2013

Photo by Michael Scott People often think I'm a privacy researcher, thanks to the Facebook and iPhone stories. The truth is I'm just curious about undiscovered data. Because a lot of it is about people's behavior, and that's an inherently... read more

Why should you care that artists are underpaid?

February 26 2013

Picture by Jamie I've spent most of my career working closely with artists, and they were usually paid less than me. At first this was just awkward, but I began to realize it was part of a deeper problem. Most... read more

Which iOS versions are Jetpac users running?

February 21 2013

Photo by Visual Media I just hit a nasty bug in the Jetpac iPad app that only seems to affect users on iOS 6.0.1. Unfortunately it seems to be deep in the OS's Facebook integration code, so I wasn't able... read more

Five short links

February 20 2013

Photo by Martin Fisch Facial profiling for the detection of mal-intent using thermal imaging - I've been out of the loop on how far image processing has come in detecting emotions. If you think a computer that recognises your face... read more

A pub that's also a theater?!?

February 19 2013

I love drinking beer, and I love watching plays. I usually have to elbow my way into a crowded bar in intermission to combine the two, which is far from ideal. Imagine my delight when I ran across the concept... read more

Things that happen to startup founders

February 16 2013

Photo by USACE Europe District You get into a lot of debt living off an anaemic salary and go bankrupt. Your spouse breaks up with you. You are fired from your own company by an outside CEO. Your company is... read more

Five short links

February 14 2013

Photo by Spodzone Why the Open Data movement is a joke - An impassioned rant, but it misses the point when it accuses the movement of cloaking itself in the mantle of progressive politics. Tom seems to expect it to... read more

What's the SF apartment market really like?

February 13 2013

Photo by Post Bear For the last six weeks I've been looking for an apartment in San Francisco. I thought I'd have to elbow through screaming mobs at every showing, but it really hasn't been too bad. I have to... read more

How to track iOS memory crashes

February 12 2013

Photo by Fingle I love being able to use HTML5 content within Jetpac, but hosting it in Apple's UIWebView component can use a lot of memory. That matters because iOS apps crash when they run out of memory, and to... read more

Five short links

February 08 2013

Mural by Monte Thrasher Heads by Monte Thrasher - Normally my short link images are side-notes, but the pentagonal helmet image led me to discover what I think is my favorite mural ever. Check out Twiggy, the world's ugliest dog,... read more

How to create a visualization

February 13 2012

Creating a visualization requires more than just data and imagery. Pete Warden outlines the process and actions that drove his new Facebook visualization project. read more

3 ideas you should steal from HubSpot

June 14 2011

HubSpot's location (near Boston) and its target market (small businesses) may keep it under the radar of Silicon Valley, but the company's approach to data products and customer empowerment are worthy of attention. read more

Lessons of the Victorian data revolution

May 23 2011

Examples from the Victorian era show that if we're going to improve the world with data, it's absolutely essential we stay grounded in reality. read more

Why you can't really anonymize your data

May 17 2011

Because we now have so much data at our disposal, any dataset with a decent amount of information can be matched against identifiable public records. To keep datasets available, we must acknowledge that foolproof anonymization is an illusion. read more

Why the term "data science" is flawed but useful

May 09 2011

While formal boundaries and professional criteria for "data science" remain undefined, here's why we should keep using the term. read more

The iPhone tracking story, one week later

April 27 2011

Apple announces fixes and sheds more light on location data. Plus, a look at some of the reporting and potential applications that have popped up. read more

Additional iPhone tracking research

April 24 2011

The iPhone tracking story led to a host of related investigations. Here's a look at some of the latest developments. read more

iPhone tracking: The day after

April 22 2011

The iPhone tracking story published here a few days ago struck an unexpected nerve. Here's a selection of the most interesting immediate reactions. read more

Will data be too cheap to meter?

February 08 2011

The data acquisition process should be increasingly automatic, and so increasingly cheap. I'm hoping for a world where information producers are paid for extracting value from that data. read more

4 free data tools for journalists (and snoops)

January 06 2011

You no longer have to be a technical specialist to find exciting and surprising data. In this excerpt from Pete Warden's ebook, "Where are the bodies buried on the web? Big data for journalists," Pete looks at four services that reveal underlying information about web pages and domains. read more

Recent Posts | All O'Reilly Posts

Pete Warden