Ewan Birney's Keynote: A Case for Open Source Bioinformaticsby Bruce Stewart
Tim O'Reilly introduced Ewan Birney as the person who had sparked his own interest in bioinformatics. Ewan had invited Tim to an open source bioinformatics conference, and Ewan's keynote was, no surprise, titled "Open Source Bioinformatics." Besides leading the open source BioPerl project, Ewan is the author of GeneWise, an open source bioinformatics software package, and he is one of the founders of Ensembl, an open, joint project to develop a software system that produces and maintains automatic annotation on eukaryotic genomes.
Ewan started his keynote by noting that bioinformatics draws people from a broad set of backgrounds, and the combination of molecular biologists, computer scientists, physicists, mathematicians, and specialists from other disciplines made the field both exciting and challenging. He would try to address as much of this audience as possible, and answer two basic questions: What is bioinformatics, and why open source?
What Bioinformatics Is
In describing bioinformatics, Ewan pointed out that because the datasets available to molecular biologists are constantly growing, the need for handling this data computationally has naturally increased. The functions of processing and storing data, reconciling this data with other existing datasets, and building layers of datasets on top of datasets are clearly best done by computers, and these functions are the basis for current bioinformatics.
Bioinformaticists are responsible for delivering these datasets to wet biologists in a way that removes some of the need for them to do their own experiments, and bioinformaticists are also helping to generate testable hypotheses. Ewan asserted that many of the techniques traditionally done by wet biologists were becoming obsolete, as the same results could now be achieved by using computers and bioinformatics programs with existing data.
O'Reilly's Interview with Ewan
We asked Ewan about his involvement in the open source Ensembl and BioPerl projects, and the state of bioinformatics tools.
For information on O'Reilly's Bioinformatics program, both conferences and books, go to http://bio.oreilly.com/.
For complete conference coverage, see O'Reilly's Bioinformatics Conference Coverage.
Do you agree that bioinformatics software should be open source?
Ewan went on to make the point that he thinks all the hype surrounding bioinformatics is justified--and there should probably be even more. The recent dramatic decrease in the cost of acquiring biological data will revolutionize many areas of science and medicine, he predicted. He used cancer research as an example, where until recently, scientists were looking at single genes and how they related to a specific type of cancer, the equivalent of looking for a needle in a haystack. The Cancer Genome Anatomy Project is now able to look at every gene and every type of cancer, which is completely changing cancer research, and the expectation is that within five years this research is going to reap great rewards. "It's all about human health," Ewan said.
Why Open Source?
In answering why open source should be used for bioinformatics, Ewan made a strong case using three arguments and pointing out that most of the important bioinformatics software is already open. His comment that Microsoft had no role in this world, and that there was no equivalent of MS Word in bioinformatics software, was roundly applauded.
First, Ewan argued that open source makes sense because it follows good and well-known scientific principles. Traditionally, scientific practice has involved openly sharing and discussing results, and providing enough information to allow third-party confirmation of results. Clearly open source software fits well into this model. Open source also helps contribute to the global scientific body of knowledge.
Second, Ewan emphasized that in biological research it's not the software that's important--it's the data. Since the actual data matters much more than the tools used to process it, there's a big benefit in sharing software. A researcher can get great advantage by getting access to other datasets, which can be achieved by sharing the software tools. Bioinformatics companies are never going to make money by selling their software, Ewan said. They're going to make money by making drugs or other products that come out of their research.
Beginning Perl for Bioinformatics
The third reason Ewan gave for supporting open source was that molecular biology and medicine will be most advanced by the creation of a common infrastructure--and this is something best done with open source tools. He claimed that in order "to realize the promise of high-throughput genomics, we need a data infrastructure for molecular biology." Infrastructures have to be open at some level, and they function best when they are open at all levels. He used the Internet as an example of an infrastructure built on open source (TCP/IP, SMTP, Web, Apache, Linux/Unix/GNU).
Although Ewan claimed to have a love/hate relationship with Perl, he acknowledged that when the pressure is on and you needed to get something done fast, Perl was a great choice. He ended his talk by mentioning the progress that was being made on BioPerl, and introducing four key contributors in the audience: Chris Dagdigian, Jason Stajich, Peter Schattner, and Lincoln Stein.
Finally, Ewan concluded that another benefit of open source projects like BioPerl is that anyone can get involved, and he encouraged interested members of the audience to join the effort. Judging from the enthusiastic response Ewan received, he'll see some new faces in the community.