In response to Paul Allen's question at Davos about data hoarding in science
Tim O'Reilly
Feb. 10, 2002 09:10 AM
Permalink

In his retrospective on the World Economic Forum, Dan Gillmor wrote:
My personal Davos moment was in a session I moderated yesterday,
a discussion of bioinformatics -- the blending of computation and biology.
Several brilliant speakers, including William Haseltine, Mark Levin, James
Sabry
and Nathan Myhrvold gave the audience a quick tour of the field and
tried to answer some key questions.
None of them was willing to predict an era when there will be drugs
that work on just one person -- a kind of personal pill. But they all agreed
that we're not far away from solving some extremely big problems in health
care.
Paul Allen , who came to the session, asked a key question but didn't get a
satisfactory response. He wanted to hear about intellectual property issues --
in particular, if I understood his question, whether the growing secrecy in
science
is harming progress. Science is supposed to be about sharing data, not
hoarding it.
Not anymore, and that's an alarming trend.
I wrote an email response to Dan and Paul, with a cc to Timo Hannay of Nature, who had worked with us on program planning for our recent Bioinformatics Technology Conference. My message was about the momentum that open source has in the bioinformatics field. But before I get to that, I want to share Timo's response to Dan's first paragraph, about personalized medicine:
Just last week Nature included a paper which, I think, illustrates well both
the exciting things we can do now and the distance we still have to go to
realise 'personalized medicine'. It used DNA microarray technology to look
at gene expression profiles in breast cancer cells. It found that there
were two distinct -- and previously unrecognised -- forms of the disease.
What's more, one form turned out to be susceptible to current treatments
while the other wasn't. So they can now predict clinical outcomes from gene
expression data, which is tremendous. But they've only managed to split
breast cancer sufferers into two groups, so the road to true personalization
is still a long one. The Economist has a good write-up of this research.
But back to the question of the growing secrecy in science. Here's what I wrote to Dan and Paul, and later decided to share more publicly here (email modified slightly to make links inline rather than explicit):
I just got back from the O'Reilly Bioinformatics Technology Conference.
Open source and open data was a major theme, especially in
keynotes from Ewan Birney of the Open Bioinformatics Institute and Lincoln
Stein of Cold Harbor Springs Laboratory. (For a summary of Ewan's talk, see A Case for Open Source Bioinformatics, and for
Lincoln's, see Building a Bioinformatics Nation.) Both
summaries also have links off to interviews with the respective speakers,
and in Lincoln's case, to his slides.
Lincoln's talk was especially interesting because he focused not just on
open source but on open data and open web services. His point was that we
need to agree on common data formats and protocols so that independent
projects can interoperate.
And Timo Hannay, one of the Nature editors, is working on a short piece
explicitly comparing the scientific process to open source. Timo's insight,
which I think is a good one, is that scientific papers are not akin to open
source projects, but to patches on open source projects. The underlying
science is the project, and the peer-reviewed papers are analogous to
patches to the underlying software.
This isn't quite what you were asking, Paul, but it's an important part of
the picture.
The conference also included a number of sessions directly on the topic of
secrecy vs. openness. Nature, which was one of the co-sponsors of the
conference, hosted a panel on scientific publishing, in which this was the
focus. We agreed that data hoarding is a real danger, but in
bioinformatics, we do also see the countervailing force of open source.
As I pointed out in an article I wrote for Linux Magazine,
the most significant work of open source last year was James Kent's heroic
effort to make sure the human genome sequence was in the public domain
rather than the property of a private company.
Dan, I'm sure you're also aware of the stand that Steven Brenner has taken. [Steven made his right to publish his results a condition of his employment at UC Berkeley, and has developed a model open source/open data contract between academics and their institutions.] Steven was also at the conference, and talked about this on a panel there. We want to
help him get his open source contract for academics out and more widely
used. Any help publicizing it would be welcome.
It's really clear that there are some real issues here, but there are people
taking up the guerdon on behalf of openness as well as those who are working
for secrecy and private advantage. So I'm hopeful that in the end, openness
will win.
Especially in a field like bioinformatics, the natural advantages of open
source really do outweigh the advantages of secrecy. No one controls all
the data. Talk after talk at the conference focused on the way that
matching up data from other researcher's databases is the key to making
sense out of your own data. This was a key focus of Terry Gaasterland's
keynote as well, and of course is at the very heart of Lincoln Stein's DAS
(Distributed Annotation System).
Timo Hannay replied to this message with some further comments (which
express his own views and not necessarily those of Nature):
I think it's debatable whether science is becoming more or less open.
Certainly, we've seen the rise of dubious (to say the least) patent claims
on things like genes. And we've seen a rising desire by academic
institutions to make the most of commercial opportunities that come out of
their research. But, on the other hand, I think there's a trend, at least
among biologists, for the scientists themselves to be more open with their
data. Traditionally, biologists have hoarded their data because it takes a
lot of effort to gather and they don't want rival groups to beat them to
important findings hidden in the numbers. Previously they got away with
this attitude partly because the logistical costs of sharing data were high.
In the age of the Internet this is no longer true. This is why Nature now
requires its authors to deposit all relevant gene and protein sequence data
in appropriate public repositories. We're starting to do the same with
microarray data too, but the repositories and data standards in this area
are less well developed, so we're not yet able to be as firm or prescriptive
as we can with gene and protein sequence data. Many researchers are still
somewhat resistant to all this and sometimes we have to compromise (e.g., by
allowing them to make their data public only after a delay of a few weeks or
months). If we didn't do this, some of these outstanding scientists would
simply publish elsewhere, so we're treading a fine line between promoting
openness and maintaining our editorial pre-eminence.
Perhaps the highest-profile case in recent years was the publication of the
Human Genome Project's paper in Nature last February:
At the same time, Science magazine published a rival paper from Celera's
privately funded sequencing project. Science agreed to allow Celera to
publish its paper but keep its sequences private. I believe that Science
seriously damaged its scientific reputation by doing this -- and quite right
too. Nature editorialised on this subject at the time.
So I guess that I'm agreeing with Tim that the open source mentality is a
growing force in science.
Tim O'Reilly
is the founder and CEO of O'Reilly Media, Inc., thought by many to be the best computer book publisher in the world. In addition to Foo Camps ("Friends of O'Reilly" Camps, which gave rise to the "un-conference" movement), O'Reilly Media also hosts conferences on technology topics, including the Web 2.0 Summit, the Web 2.0 Expo, the O'Reilly Open Source Convention, the Gov 2.0 Summit, and the Gov 2.0 Expo. Tim's blog, the O'Reilly Radar, "watches the alpha geeks" to determine emerging technology trends, and serves as a platform for advocacy about issues of importance to the technical community. Tim's long-term vision for his company is to change the world by spreading the knowledge of innovators. In addition to O'Reilly Media, Tim is a founder of Safari Books Online, a pioneering subscription service for accessing books online, and O'Reilly AlphaTech Ventures, an early-stage venture firm.
Comment on this weblog
You must be logged in to the O'Reilly Network to post a comment.
Return to weblogs.oreilly.com.
Weblog authors are solely responsible for the content
and accuracy of their weblogs, including opinions they
express, and O'Reilly Media, Inc., disclaims any and
all liabililty for that content, its accuracy, and
opinions it may contain.
This work is licensed under a
Creative Commons License.
|