Quantcast
Tom White

http://twitter.com/tom_e_white

Hadoop expert, author.

Areas of Expertise:

  • Apache Hadoop
  • distributed computing
  • big data
  • programming
  • writing

Biography

Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.

Books

Hadoop: The Definitive Guide Hadoop: The Definitive Guide
by Tom White
June 2009
Print: $44.99
Ebook: $35.99

Hadoop: The Definitive Guide: Rough Cuts Version Hadoop: The Definitive Guide: Rough Cuts Version
by Tom White
September 2008
OUT OF PRINT

Articles

Blog

Tom's blog posts are hosted at:
http://www.lexemetech.com/

"Hadoop: The Definitive Guide" Coming Soon

May 06 2009

After a busy couple of months I've finished the writing for "Hadoop: The Definitive Guide". It's now going through the production process at O'Reilly.You can pre-order it on Amazon and O'Reilly. You can also get the Rough Cuts version from O'Reilly to read today, although it hasn't yet been refreshed… read more

Draft Pig Chapter

January 27 2009

A couple of quick updates on the Hadoop book I'm writing. The Pig chapter is now available on Safari. It still has a few holes, but I'd love to hear feedback on it.Also included is a Hadoop case study from Last.fm. Thanks to Adrian Woodhead and Marc de Palol for… read more

Hadoop Developer Zeitgeist

November 20 2008

The Cloudera team have just released a website which has a few reports on various Hadoop development metrics. I like the Most Watched Open Jira Issues, as it gives a good summary of what Hadoop Core developers are thinking about.Personally, I can't wait for the new MapReduce API (HADOOP-1230), which… read more

Cloudera

October 16 2008

I'm pleased to announce that I've joined Cloudera, a new startup providing support for Hadoop. Amr Awadallah (who's one of the founders) has got more details in his blog post. read more

Hadoop: The Definitive Guide

September 16 2008

The Rough Cut of Hadoop: The Definitive Guide is now up on O'Reilly's site. There are a few chapters available already, at various stages of completion. Remember, it's still pretty rough. I'd love to hear any suggestions for improvements that you may have though. You can give feedback on the… read more

Hosting Large Public Datasets on Amazon S3

September 04 2008

There's a great deal of interest in large, publicly available datasets (see, for example, this thread from theinfo.org), but for very large datasets it is still expensive to provide the bandwidth to distribute them. Imagine if you could get your hands on the data from a large web crawl, the… read more

Elastic Hadoop Clusters with Amazon's Elastic Block Store

August 23 2008

I gave a talk on Tuesday at the first Hadoop User Group UK about Hadoop and Amazon Web services - how and why you can run Hadoop with AWS. I mentioned how integrating Hadoop with Amazon's "Persistent local storage", which Werner Vogels had pre-announced in April, would be a great… read more

Pluggable Hadoop

July 23 2008

I'm noticing an increased desire to make Hadoop more modular. I'm not sure why this is happening now, but it's probably because as more people start using Hadoop it needs to be more malleable (people want to plug in their own implementations of things), and the way to do that… read more

RPC and Serialization with Hadoop, Thrift, and Protocol Buffers

July 08 2008

Hadoop and related projects like Thrift provide a choice of protocols and formats for doing RPC and serialization. In this post I'll briefly run through them and explain where they came from, how they relate to each other and how Google's newly released Protocol Buffers might fit in.RPC and WritablesHadoop… read more

Hadoop beats terabyte sort record

July 03 2008

Hadoop has beaten the record for the terabyte sort benchmark, bringing it from 297 seconds to 209. Owen O'Malley wrote the MapReduce program (which by the way has a clever partitioner to ensure the reducer outputs are globally sorted and not just sorted per output partition, which is what the… read more

Hadoop Query Languages

June 20 2008

If you want a high-level query language for drilling into your huge Hadoop dataset, then you've got some choice:Pig, from Yahoo! and now incubating at Apache, has an imperative language called Pig Latin for performing operations on large data files.Jaql, from IBM and soon to be open sourced, is a… read more

"The Next Big Thing"

June 13 2008

James Hamilton on The Next Big Thing:Storing blobs in the sky is fine but pretty reproducible by any competitor. Storing structured data as well as blobs is considerably more interesting but what has even more lasting business value is the storing data in the cloud AND providing a programming platform… read more

Bluetooth Castle

May 30 2008

Today I visited Raglan Castle in Monmouthshire with my family. Cadw, the government body that manages the castle, were running a trial to deliver audio files to visitors' mobile phones using Bluetooth. As I walked through the entrance I simply made my phone discoverable, waited a few seconds for the… read more

Portable Cloud Computing

April 30 2008

Last July I asked "Why are there no Amazon S3/EC2 competitors?", lamenting the lack of competition in the utility or cloud computing market and the implications for disaster recovery. Closely tied to disaster recover is portability -- the ability to switch between different utility computing providers as easily as I… read more

Hadoop at ApacheCon Europe

April 14 2008

On Friday in Amsterdam there was a lot of Hadoop on the menu at ApacheCon. I kicked it off at 9am with A Tour of Apache Hadoop, Owen O'Malley followed with Programming with Hadoop’s Map/Reduce, and Allen Wittenauer finished off after lunch with Deploying Grid Services using Apache Hadoop. Find… read more

Multimedia

Webcast: An Introduction to Hadoop
July 16, 2009
Duration: Approximately 60 minutes. Cost: Free In this webcast, Cloudera founder Christophe Bisciglia and O'Reilly author Tom White will provide an introduction to Hadoop/MapReduce, the open source project that allows organizations to process, store...

Tom White