Creating Databases and Tools for Future Research

 

Overview: Generating
the Building Blocks
The Challenge
of Maize Genetics
Why Discover
Maize Genes
Finding Genes
 
The EST Strategy
 
RescueMu Tagging
Linking Genes
to Their Functions
Creating Databases
and Tools
 
PhenotypeDB
Exercise
Building a Storehouse
of Seeds from Mutated
Plants
Accomplishments
What Next for
Maize Genetics?
Glossary

Databases

Tools

 

ZmDB: Maize Genome Database

ZmDB is a maize genome database that collects all maize genome information from GenBank on a regular basis and organizes it for maize researchers. It was created for the MGDP by collaborator Volker Brendel's bioinformatics group at Iowa State University.

GenBank, the federally funded catalog of genetic information for all organisms, stores its data in chronological order. To make these data more useful, researchers create specialized databases containing genetic information about each organism. ZmDB provides that service for maize researchers. It gathers related DNA sequences and provides annotations describing a gene's likely function based on the scientific literature. ZmDB provides additional information not found in GenBank, such as links to the pictures of mutant plants discovered by the MGDP team.

 

PhenotypeDB

PhenotypeDB, which is integrated with ZmDB, describes and catalogues the many mutant plants produced by MGDP. It allows researchers to identify plants and buy seed carrying particular mutant traits. The database also links the traits to RescueMu insertion mutations that might cause them. To walk through a PhenotypeDB exercise, click here.

 

Plant Genome Database (PlantGDB) http://www.zmdb.iastate.edu/PlantGDB/

PlantGDB was also created by MGDP collaborator Volker Brendel, although funded under a separate grant,. This database of plant genomic and EST sequences allows cross-species comparisons between about 20 major crop and model species. The database provides snapshots of the current knowledge of plant gene composition and facilitates researchers' understanding of plant genetics and evolution.

 

ZmDBAssembler

This tool evaluates new ESTs to see if their sequences overlap with other EST's already listed in ZmDB. It then assembles matching ESTs into clusters corresponding to unique gene fragments. (learn more about ZmDBAssembler)

 

MuSeqBox: Multi-query Sequence Blast Output Examination

MuSeqbox allows researchers to easily compare many genetic sequences to one another or to their complementary proteins. The data for MuSeqBox come from BLAST, the most widely used genome search program (see link below). BLAST performs the researcher's desired comparison, but produces a complex output that is difficult to read -- especially when more than a handful of sequences are submitted. MuSeqBox converts that output into a user-friendly table containing the most important information.

Thus, for example, a researcher can generate a table comparing all maize ESTs to known proteins. The table will list each EST and the three proteins whose amino acid sequences most closely match the protein that would be produced by the EST's genetic sequence. Indeed, Brendel's group uses this approach to annotate ESTs in ZmDB.

Researchers can also refine the table further by setting their own criteria for what constitutes a close match. Or they can set criteria for determining whether a gene sequence covers the entire protein or only part of it (due to splicing).

MuSeqBox can be used to display genetic data about any other organism including maize. (learn more about MuSeqBox)

 

GeneSeqer and SplicePredictor

MGDP collaborator Volker Brendel's group created GeneSeqer and SplicePredictor to help locate introns -- portions of genes that get spliced out when a gene is transcribed into messenger RNA (mRNA). Intron identification boosts researchers' understanding of gene structure and may have practical value for genetic engineering.

SplicePredictor looks for probable splice sites in genomic DNA using rules about what makes a good splice site. The program further evaluates the context in which sites occur by examining neighboring sequences for known introns/exon characteristics; determining whether the site has a complementary pair such that the two sites define a likely intron or exon; and assessing whether other possible site pairs constitute more likely splice sites than the ones under consideration.

In addition, SplicePredictor can look for introns by finding stretches of genomic DNA that do not have matching ESTs. ESTs correspond to stretches of mRNA and do not contain the introns spliced out from the direct RNA copy of genomic DNA. This additional search function greatly increases SplicePredictor's accuracy in detecting splice sites.

GeneSeqer is similar to SplicePredictor but it is specifically designed to look at longer stretches of genomic DNA, and it only aligns ESTs that closely match the genomic DNA sequence. GeneSeqer displays its output graphically at ZmDB as shown below. Gaps in the EST sequence that GeneSeqer fills in with genomic sequence are presumed to be introns. (Learn more about GeneSeqer).

Genomic DNA assembler (Still in progress)

When MGDP finds genes by RescueMu tagging (learn more), the output consists of multiple, overlapping copies of the same stretches of DNA -- called genomic survey sequences, or GSSs. By repeatedly sequencing the same pool of DNA, MGDP researchers make sure they find the genes they are looking for. But this redundancy also fills ZmDB and GenBank with numerous copies of the same or overlapping sequences. Bioinformatics tools can distill this mass of data down to its essence: the gene and all sources of information supporting its existence.

To extract constituent genes from MGDP's GSS data, Brendel's group is building a genomic DNA assembler like the ZmDB assembler for ESTs. The output, like that from GeneSeqer, will graphically display the various GSSs and ESTs that make up the gene, along with their GenBank number, and information about the gene's possible function.

Links:
BLAST tutorial:
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/query_tutorial.html






<< Previous / Next >>