Tim O’Reilly delivered the first keynote at the O’Reilly European Open Source Convention 2006 this morning by talking about the changing face of open source. What does it mean if our applications and APIs are free but the data is not?

This is quite an important question. We often think of open source in terms of source code only, but this is wrong. As he mentioned in the talk, “software is performed rather than distributed.” Increasingly we are using software platforms built using open source components (say Google on Linux), but no longer distributed. What does this mean to open source and the production of open source and the community around open source?

On a more positive note, Tim reminded us that about asymmetric competition. Comparing the number of employees across the top ten sites on the web, you’ll find that craigslist has under two dozen while all the rest have thousands and thousands of employees. The key here is that craigslist and similar applications are relying on users to build value, something that we can all do.

But this brings us back to the main point. It’s the data that’s starting to matter more and more, even if it is user-generated. Even if Google open sourced its code, there is no way I can build my own search engine unless I had two additional ingredients: infrastructure and data. The necessary infrastructure is enormous (power is important) and Tim urged the community to pay attention to projects that are concerned with scalability, mentioning Nagios, memcached, Hadoop, OpenID and server virtualization as worth a look.

In terms of data, what do we do? What if all our software is open, and all our protocols and APIs are open, what does it matter as the companies still own us if they have the data. These are all questions we hope to pursue during EuroOSCON this year.

Technorati Tags: , , ,