The National Human Genome Research Institute model organism ENCyclopedia Of DNA Elements

The modENCODE Project will try to identify all of the sequence-based functional elements in the Caenorhabditis elegans and Drosophila melanogaster genomes.

Comprehensive characterization of the Drosophila transcriptome

Susan Celniker (PI) Lawrence Berkeley National Laboratory
Michael Brent Washington University
Steven Brenner University of California, Berkeley
Peter Cherbas Indiana University
Thomas Gingeras Cold Spring Harbor Laboratory
Brenton Gravely University of Connecticut Health Systems
Norbert Perrimon Harvard University
Roger Hoskins Lawrence Berkeley National Laboratory

This proposal responds to the modENCODE RFA to identify all of the sequence-based functional elements associated with transcribed sequences "including both protein coding and non-protein coding sequences, specifically characterizing gene structures including transcription start sites (TSS), polyadenylation sites, and alternative transcripts in Drosophila". We propose to generate a comprehensive set of developmentally staged and tissue- and cell-specific RNAs for expression profiling using high-density genome tiling microarrays and 454 pyrosequencing of small RNAs. These expression data will be used for sophisticated transcript modeling that integrates extant EST and cDNA sequence and comparative data from the 12 sequenced Drosophila genomes. The models will serve as the foundation for experimental validation by sequencing of cDNA and RTPCR products, RACE to define TSSs, and for small non-protein coding RNAs (ncRNAs), by functional validation in RNAi assays. The experimental validation data will be incorporated into the informatics pipeline to produce improved transcript models. In our final product, these models will include comprehensive annotations of TSSs, exon/intron structures, polyadenylation sites and the cis-elements required for splicing. We have assembled a project management team to obtain the necessary infrastructure and expertise to complete the goals of the proposal. The managers and their roles and responsibilities are: Peter Cherbas (IU), RNA samples and evaluation of computational models; Thomas Gingeras (Affymetrix), microarray expression profiling and analysis; Michael Brent (WUSTL), data integration and transcript modeling; Roger Hoskins (LBNL), sequence-based validation; Brenton Graveley (UCHC), alternate splicing; and Norbert Perrimon (Harvard), function of ncRNAs.

Experimental Approaches

SPECIFIC AIM 1: Identify protein coding and noncoding transcribed sequences

We propose to generate over 600 RNA samples in biological triplicates and use them to generate expression profile maps detailing the sites of transcription across the fly genome using 1,692 whole genome tiling arrays at 35 bp resolution as a broad survey of the transcriptome and 630 arrays at 5 bp resolution to identify at high resolution transcripts ends, splice sites, and small RNAs. The same RNA will be used for validation in Aim 2. We also plan to generate first-order transcript connectivity maps by hybridizing 200 pools of 100 RACE products to 35 bp arrays.

SPECIFIC AIM 2: Map transcript structures

The data produced in Aim 1 will be incorporated into a computational analysis to predict complete transcript structures. Any of the structural components not yet validated by annotated sequence data will be identified for additional experimental validation and characterization. RLM-RACE will be used to map and characterize 20,000 TSSs, 20,000 RT-PCRs will be used to validate exon/intron junctions and 6,000 targeted cDNA library screens will be done to validate new transcripts. 454 Life Sciences pyrosequencing will be used to characterize 15 RNA samples, and Northern analysis will be used to validate 1,000 ncRNAs. RNAi knock-downs of 125 RNA-binding protein genes in cell lines will be analyzed by transcript expression profiling to identify alternatively spliced exons and map cis-elements required to regulate alternate transcription.

SPECIFIC AIM 3: Functionally validate ncRNA

We propose to evaluate the biological functions of the newly annotated sequences identified as ncRNAs in Aims 1 and 2, a critical task that has yet to be performed in a systematic fashion. Functional validation of these new ncRNAs will be based on our extensive experience in high-throughput RNA interference (RNAi) and overexpression approaches in cell-based assays.

Information Resources to be Generated

  1. Sequence from ESTs, cDNA, RT-PCR and RACE products
  2. Whole genome expression Microarray data
  3. RNAi tissue culture data

Contact Information

PI: Sue Celniker, celniker@fruitfly.org

  1. University of Indiana
  2. Cold Spring Harbor Laboratory
  3. Washington University
  4. LBNL
  6. Harvard