The Open Informatics Petitionby Jason E. Stewart and Harry Mangalam
Editor's note: For an opposing viewpoint, read Why I'm Not Supporting the Open Informatics Petition, by Andrew Dalke.
Jason E. Stewart and Harry Mangalam have created an Internet-available petition to require that software resulting from publicly funded research be made open source. The authors feel that if the public pays for the research, it should have access to the results of that research. In this article, they present their case, review objections and apparent conflicts with the Bayh-Dole act, and try to resolve remaining issues. Links to ongoing discussions and other useful online articles are also included.
In writing a paper on analytical software for large-scale gene expression, we were struck by the wide range of licenses used by software packages produced by different academic groups. They ranged from strict GPL to executable-only, which was available after signed licenses were faxed back directly from the office of a high university official. As we explored further we were soon drawn into the vortex of the Bayh-Dole Act1 and Technology Transfer Offices. It was also at this time that Steven Brenner was discovering that it was illegal for an employee of the University of California to release his or her work as open source without first clearing it with the Technology Transfer Office. This apparent schism between the assumed ability of academics to freely share the results of their work with others and the UC employment contract was eventually amicably settled, but not without effort. This prohibition on the voluntary release of academic software was one of the points that sparked the creation of the OpenInformatics.org petition site as an adjunct to the article2.
The Open Informatics petition can be viewed online in all its warty splendor, but briefly, it requests that:
When money from public research grants is used to develop software, that software should be published under an open source or a free software license, as a condition of funding. Such licensing is the software equivalent of peer-reviewed publication of research results.
The two key points being that public money should fund public software, and that scientific software should be subject to peer review. In a scientific endeavor, we make a hypothesis and provide evidence to support or discredit the hypothesis. In reporting the evidence, we publish the data in as raw a form as is possible so that others can examine and critique it; this is how science moves forward. We submit that like other "Materials and Methods," the source code of software involved in arriving at a decision should also be published.
To follow are some explicit advantages to publishing source code:
Jason E. Stewart will be presenting G2G: A Peer-to-Peer Architecture for Gene Expression Data. He is also leading a Birds-of-a-Feather session on the Open Informatics petition, all at the upcoming O'Reilly Bioinformatics Technology Conference.
- The peer review of software allows bugs or exceptions that the original authors did not account for to be found and fixed by others.
- It provides for a mechanism by which original features can be improved (sometimes quite rapidly) and additional features can be added, without having to rebuild the underlying infrastructure (initialization, input/output, data structures, help and documentation files) from scratch.
- Because of the wider use, it encourages standardization of efficient file or exchange formats, decreasing the time spent on converting one format to another.
- It increases the speed of code development through the methods noted above, which can make research cheaper and more productive, and therefore increase the return on investment in research.
- It can increase cooperation between academic and commercial organizations that share interests in a particular field of research.
- It helps software remain available and usable beyond the life or interest of the original author.
Our goals in writing the petition were two-fold: we wanted to educate researchers and funding agencies about open source, and we wanted to start a broader discussion about public funding, software licensing, and Bayh-Dole. Ultimately, we hope the petition will encourage public-funding agencies to create a joint policy defining how they intend to handle open source software.
The Open Informatics petition gained more visibility when both Science and the Associated Press covered the petition. It was then the subject of some spirited discussion on both Slashdot and the O'Reilly Bioinformatics list. We had naively expected nothing but unswerving support and unbending devotion, but a number of objections were raised, some of which we describe below.
|Do you think all code generated by publicly funded research should be licensed as open source?|
The contention surrounding our petition involves only a few major issues and a number of minor ones. The major objections are:
- The petition is too coercive and thus robs scientists of their freedom
of choice as to how to dispose of their inventions. Scientists should
have the choice to dispose of their inventions as they see fit.
Our response: We have no problem with such a desire as long as the scientist uses private funds to develop the code. However, we posit that if you use public funds to develop code, you have an obligation to return to the public the results of that research. Researchers publish papers to promote their research and gain the respect of their peers. We believe that the software that results from such efforts falls into the same category.
- It is too large a step and thus leaves too many administrative
issues unresolved; smaller, incremental steps should be taken to
move towards the overall goals. For example: Who would oversee
compliance with this requirement? Where would the code be stored?
Would there be any standardization of code formatting? What about
mixing proprietary software and open source software, or the use of
software that uses incompatible licenses?
Our response: These are important issues but they are also issues that can be resolved reasonably easily. They are discussed in more detail on the OpenInformatics.org Web site.
- It runs counter to current laws, especially the Bayh-Dole act.
Our response: We do not see the petition as conflicting with the Bayh-Dole Act. There is nothing that prevents source code resulting from research from being licensed under dual or even multiple licenses, of which one is an open source license while others may be proprietary, thus allowing the licensee to privatize any changes and improvements to the code. We see this as being entirely compatible with the Bayh-Dole Act as well as allowing the code to continue to be improved by the community, thereby increasing its value and attractiveness for proprietary licensing.
- Examination of the source code is not equivalent to peer review; there
are other ways to validate algorithms or claims of superiority of
an analytical approach without using the source code.
Our response: This is formally correct, but we still maintain that the release of the source code is almost always beneficial to debugging the problems related to a questionable computational result.
While we were initially a little surprised at the criticism, we were glad to have the feedback. Some of the criticism was (and still is) deserved. Other objections we think are off the mark, but most of the criticism caused us to re-examine our initial position and make explicit some of the cloudy or implicit language of the original phrasing.
Many of these points and others are addressed in the Petition FAQ, which is continuing to grow and now provides a reasonable overview of some of the issues involved.
The petition is not intended as a final policy document. On the contrary, we raise a number of issues that require discussion within a larger forum. The area which needs the most discussion concerns software license details. For example, the petition only indicates that software be published using either an open source or free software license. Which licenses should be allowed? Should agencies choose a single license for all software, and if so, which ones? Or should authors be allowed to choose from a set of approved licenses, and if so, who chooses which licenses get approved, and what are the criteria that should be used to decide? The FSF lists four while the OSI lists nine possible considerations.
We are not wedded to either open source or free software, but chose them because they each satisfy the four basic criteria listed in the free software definition. Perhaps there are criteria that are unique to scientific software, which the FSF and OSI did not consider. Because of these wide-open issues we encourage everyone who is interested to participate in the ongoing discussion.
We've only begun to scratch the surface of this important topic. Please visit the OpenInformatics.org Web site to read more about the petition. Also, we want to hear more discussion around the issues raised, so we encourage you to join the petition discussion list or to go to the upcoming O'Reilly Bioinformatics Technology conference and attend the panel discussion on public funding and open source, and the Birds-of-a-Feather session for the OpenInformatics.org petition.
- Bayh-Dole Act1
- The relationship of public funding of research and the Bayh-Dole Act's implications of allowing the exclusive privatization of
publicly funded resources is too large an issue to cover here in the
depth that it needs to be. We can however point to several other online
articles that cover it in more detail, both
favorable (UC Office of Technology Transfer and 21stC) and
unfavorable (AlterNet.org and the Atlantic).
- Salon article2
- The situation is not limited to universities. A recent Salon article discusses how it took years for researchers at various national laboratories to obtain permission to release software as open source.