Comprehensive characterization of the Drosophila transcriptome
|
Susan Celniker (PI) Michael Brent Steven Brenner Peter Cherbas Thomas Gingeras Brenton Gravely Norbert Perrimon Roger Hoskins |
Lawrence Berkeley National Laboratory Washington University University of California, Berkeley Indiana University Affymetrix University of Connecticut Health Systems Harvard University Lawrence Berkeley National Laboratory |
This proposal responds to the modENCODE RFA to identify all of the sequence-based functional elements associated with transcribed sequences "including both protein coding and non-protein coding sequences, specifically characterizing gene structures including transcription start sites (TSS), polyadenylation sites, and alternative transcripts in Drosophila". We propose to generate a comprehensive set of developmentally staged and tissue- and cell-specific RNAs for expression profiling using high-density genome tiling microarrays and 454 pyrosequencing of small RNAs. These expression data will be used for sophisticated transcript modeling that integrates extant EST and cDNA sequence and comparative data from the 12 sequenced Drosophila genomes. The models will serve as the foundation for experimental validation by sequencing of cDNA and RTPCR products, RACE to define TSSs, and for small non-protein coding RNAs (ncRNAs), by functional validation in RNAi assays. The experimental validation data will be incorporated into the informatics pipeline to produce improved transcript models. In our final product, these models will include comprehensive annotations of TSSs, exon/intron structures, polyadenylation sites and the cis-elements required for splicing. We have assembled a project management team to obtain the necessary infrastructure and expertise to complete the goals of the proposal. The managers and their roles and responsibilities are: Peter Cherbas (IU), RNA samples and evaluation of computational models; Thomas Gingeras (Affymetrix), microarray expression profiling and analysis; Michael Brent (WUSTL), data integration and transcript modeling; Roger Hoskins (LBNL), sequence-based validation; Brenton Graveley (UCHC), alternate splicing; and Norbert Perrimon (Harvard), function of ncRNAs.
Experimental Approaches
SPECIFIC AIM 1: Identify protein coding and noncoding transcribed sequences
We propose to generate over 600 RNA samples in biological triplicates and use them to generate expression profile maps detailing the sites of transcription across the fly genome using 1,692 whole genome tiling arrays at 35 bp resolution as a broad survey of the transcriptome and 630 arrays at 5 bp resolution to identify at high resolution transcripts ends, splice sites, and small RNAs. The same RNA will be used for validation in Aim 2. We also plan to generate first-order transcript connectivity maps by hybridizing 200 pools of 100 RACE products to 35 bp arrays.
SPECIFIC AIM 2: Map transcript structures
The data produced in Aim 1 will be incorporated into a computational analysis to predict complete transcript structures. Any of the structural components not yet validated by annotated sequence data will be identified for additional experimental validation and characterization. RLM-RACE will be used to map and characterize 20,000 TSSs, 20,000 RT-PCRs will be used to validate exon/intron junctions and 6,000 targeted cDNA library screens will be done to validate new transcripts. 454 Life Sciences pyrosequencing will be used to characterize 15 RNA samples, and Northern analysis will be used to validate 1,000 ncRNAs. RNAi knock-downs of 125 RNA-binding protein genes in cell lines will be analyzed by transcript expression profiling to identify alternatively spliced exons and map cis-elements required to regulate alternate transcription.
SPECIFIC AIM 3: Functionally validate ncRNA
We propose to evaluate the biological functions of the newly annotated sequences identified as ncRNAs in Aims 1 and 2, a critical task that has yet to be performed in a systematic fashion. Functional validation of these new ncRNAs will be based on our extensive experience in high-throughput RNA interference (RNAi) and overexpression approaches in cell-based assays.
Information Resources to be Generated
- Sequence from ESTs, cDNA, RT-PCR and RACE products
- Whole genome expression Microarray data
- RNAi tissue culture data
Contact Information
PI: Sue Celniker, celniker@fruitfly.org
- University of Indiana
- PI: Peter Cherbas, cherbas@indiana.edu
- Wet Lab: Andreas Rechtsteiner, andreas@cgb.indiana.edu
- Affymetrix
- PI: Thomas Gingeras, tom_gingeras@affymetrix.com
- Wetlab: Aarron Willingham, aarron_willingham@affymetrix.com
- Informatics: Srinka Ghosh, srinka_ghosh@affymetrix.com
- Washington University
- PI: Michael Brent, brent@cse.wustl.edu
- Wet Lab: Laura Langton, langton@cse.wustl.edu
- Informatics: Charles Comstock, cc1@cse.wustl.edu
- Informatics: Brian Koebbe, koebbe@wustl.edu
- The entire group can be reached with modencode@mblab.wustl.edu
- LBNL
- PI: Sue Celniker, celniker@fruitfly.org
- PI: Roger Hoskins, hoskins@fruitfly.org
- Informatics: Joe Carlson, joe@fruitfly.org
- UCHC-UCB
- PI: Brenton Graveley, graveley@neuron.uchc.edu
- PI: Steven Brenner, brenner@compbio.berkeley.edu
- Informatics: Angela Brooks, angelabrooks@berkeley.edu
- Harvard
- PI: Norbert Perrimon, perrimon@receptor.med.harvard.edu
- Imaging: Ian Flockhart, iflockha@genetics.med.harvard.edu
- Informatics: Matt Booker, mbooker@genetics.med.harvard.edu



