DEODAS: A DEgenerate Oligonucleotide Design and Analysis Systemby Karl Diedrich
DEODAS (DEgenerate Oligonucleotide Design and Analysis System) is an end-user computer program for designing, prescreening, and selecting oligonucleotides (oligos) for microarray probes or PCR primers. DEODAS was developed by integrating several open source software tools into an automated system. The probes and primers output are degenerate oligos designed to detect subgroups of gene families instead of individual genes. This allows the oligos to detect related genes in uncharacterized organisms. The oligos are also screened against databases of known gene sequences, to reduce cross-reaction of the oligos with unrelated gene families. The integration and automation of tools in DEODAS greatly decreases the amount of interactive time required to design and screen oligos.
Karl Diedrich will present a lightning talk on DEODAS at the upcoming O'Reilly Bioinformatics Technology Conference.
High-density DNA microarrays for biological research allow for the detection and expression-level measurement of thousands of genes by massively parallel nucleic acid hybridizations. They have tremendous application to medical, agricultural, and environmental research.
For example, DNA microarrays can be used for analyzing genes of soil microorganisms. These microorganisms include bacteria, fungi, and micro-eukaryotes. Degenerate oligos based on known genes are needed to study soil microorganisms because up to 99 percent of them are unculturable and have never been characterized in the laboratory. The genes of unculturable organisms can be studied by direct isolation of their nucleic acids from soil. The degenerate oligos are able to detect related gene sequences from these direct isolates.
Degenerate probes are prone to cross-reaction with unrelated genes and require prescreening. The varying sequences of degenerate probes allow them to detect groups of related genes. But this also increases the chance of interaction with unrelated gene families. This would cause false-positive results in the laboratory.
Potential false positives can be reduced by electronically screening the degenerate oligos against databases of existing genes. Oligos closely matching unrelated genes should be rejected to reduce the potential for false-positive interactions.
DEODAS combines existing program tools together in an automated batch-design system with a graphical interface. The input into DEODAS is DNA sequences from protein families.
The program, Clustalw, is used for multiple-alignment and to provide a phylogenetic tree. The family is broken into subgroups based on the phylogenetic tree. Then Clustalw realigns each subfamily providing a higher quality alignment.
The BLIMPS: CODEHOP program set is then used to find highly conserved blocks in the multiple-alignments and design probes based on these blocks. The probes are then screened against Genbank sequence databases using the EMBOSS: fuzznuc program.
The results are stored in a database with the PostgreSQL database management system. DEODAS provides a graphical interface for viewing the results and selecting oligos for experiments. Consequently, DEODAS greatly reduces the manual effort to use different programs to select the optimal oligos by combining programs in one system.
Here is a flowchart that describes the steps DEODAS uses and the programs it incorporates:
And here is a screen shot that shows DEODAS's graphical user interface:
By combining existing tools together, DEODAS provides new features for deigning oligos, including breaking gene families into subgroups and automatically screening oglios for cross-reaction.
DEODAS breaks gene families into subgroups. Clustalw can provide better alignments to the closely related sequences in subgroups. This provides for more highly conserved blocks. More oligos can be designed for these conserved blocks. In addition, the oligos target branches of a gene family instead of the entire family. Therefore the probes can be used to detect the diversity inside gene families.
DEODAS also automatically screens oligos for cross-reaction. Sequences matching unrelated genes in Genbank are rejected. Automating the screening process greatly reduces the time and effort independent screening would take.
DEODAS has been developed on RedHat Linux using C, C++, and Python. Python is the primary development language. Required programs and libraries include EMBOSS, BLIMPS, Clustalw, PostgreSQL, Python, and GNOME. The graphical interface is written to the GNOME desktop. Any Unix-like POSIX system should be capable of running DEODAS.
In the future, DEODAS will be more modular. The oligo design engine is being made into a separate module so design engines other than BLIMPS: CODEHOP can be used. Sequences will be able to enter in DEODAS's automated process at different stages. BLASTP can be integrated so DEODAS finds the relatives of a single input sequence. Users will also be able to enter existing, high-quality, multiple-sequence alignments. DEODAS is being turned into a flexible oligo design tool.
The future development of DEODAS depends on collaborations with laboratories. Laboratory verification of probes is being done at the U.S. Army Corps of Engineers, Environmental Research and Development Center. If you are interested in developing probes with DEODAS, please visit the DEODAS Web site for contact information.
Fredrickson, H., Perkins, E., et al. (2001). Towards environmental toxicogenomics--development of a flow-through, high-density DNA hybridization array and its application to ecotoxicity assessment, The Total Science of the Environment, 274 (2001) 137-149.
Rice P., Longden I., Bleasby A. (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet., 16 (6), 276-7.
Rose, T.M., Schultz, E.R., et al. (1998). Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res., 26 (7), 1628-1635.
Karl Diedrich developed the DEgenerate Olgionucleotide Design and Analysis System (DEODAS), and is currently on the programming staff at Myriad Genetics, Inc. working on the Pronet, protein-protein interaction database.
Return to the O'Reilly Network.