×

modENCODE

The National Human Genome Research Institute model organism ENCyclopedia Of DNA Elements

The modENCODE Project will try to identify all of the sequence-based functional elements in the Caenorhabditis elegans and Drosophila melanogaster genomes.

Companion Page for modENCODE Integrative Worm Paper

modENCODE data file formats

A guide to the data file formats, which describes WIG file format, GFF3 format, and provides an introduction on how to upload and view them on WormBase, modENCODE and UCSC browsers.

Figures

Figure 1: Transcriptome Features and Alternative Splicing

Analysis completed using WormBase version WS209 and WS170
Parts A and B
Interpreted data:
Source experimental data
Part C
Interpreted data:
Source experimental data
Part D
Source experimental data

Figure 2: Expression & Binding Dynamics

Analysis completed using WormBase version WS180
Parts A and B
Source experimental data
Part C
Source experimental data

Figure 3: Highly Occupied Target (HOT) Regions

Analysis completed using WormBase version WS190
Part A and B
Interpreted data
Source experimental data:
Part C
Source experimental data:
Additional Data Sources

Figure 4: Integrated Regulatory Network

Analysis completed using WormBase version WS170
Interpreted data
Source experimental data:

Figure 5: Chromosome-scale Domains of Chromatin Organization

Analysis completed using WormBase version WS170
Source experimental data:

Figure 6: Chromatin Patterns around Genes

Analysis completed using WormBase version WS170
Source experimental data:

Figure 7: Statistical Models Predicting TF-binding and Gene Expression from Chromatin Features

Analysis completed using WormBase version WS190
Aggregated Data
Source experimental data

Figure 8: Relative proportion of annotations among constrained sequences

Analysis completed using WormBase version WS190

Supplementary Figures

Figure S1: ChIP-chip and ChIP-seq comparison

Analysis completed using WormBase version WS190
Source experimental data:

Figure S22: TF binding around non-coding RNAs

Analysis completed using WormBase version WS190
Source experimental data

Figure S23: Co-occurrence of transcription factors

Analysis completed using WormBase version WS190
Interpreted data
Source experimental data:

Figure S24: Comparison of PeakSeq and PeakRanger peak calls

Figure S25: Distribution of TF binding numbers

Analysis completed using WormBase version WS190
Interpreted data

Figure S26: Control experiments for HOT regions

Analysis completed using WormBase version WS190
Part B: DPY-27 only binds to HOT regions on the X chromosome
Analysis completed using WormBase version WS190
Source experimental data:
Interpreted data

Figure 27: HOT regions enrichments

Part A
Source experimental data:
Part B
Source experimental data:
Part A
Source experimental data:

Figure S28: Higher gene expression level in HOT regions in RNA-seq

Analysis completed using WormBase version WS190
Source experimental data:
Interpreted data
Part B: Higher gene expression level in HOT regions in tiling arrays
Analysis completed using WormBase version WS190
Source experimental data:

Figure S29: HOT regions are broadly expressed

Analysis completed using WormBase version WS190
Interpreted data
Source experimental data:
Additional Data Sources

Figure S30: Pair-wise correlations of PHA-4 binding signal across different stages

Analysis completed using WormBase version WS190
Source experimental data:

Figure S31: Examples of Pol II binding and expression

Analysis completed using WormBase version WS190
Source experimental data:

Figure S32: Histone marks distribution over repetitive elements

Analysis completed using WormBase version WS190
Source experimental data:

Figure S33: Promoters of chromosome X genes have higher GC content compared to autosomes

Analysis completed using WormBase version WS190
Source experimental data:

Figure S34: Histone marks aggregation around TSSs and TTSs

Source experimental data:

Figure S35: TF sequence motif discovery

Analysis completed using WormBase version WS190

Figure S36: TFs in the larval network

Figure S37: Network motifs

Analysis completed using WormBase version WS170
Interpreted data
Source experimental data:

Figure S38: TF binding and chromatin features

Analysis completed using WormBase version WS182
Part A: Correlations between whole-genome transcription factor binding signals and chromatin features
Analysis completed using WormBase version WS182
Source experimental data:
Part B: Modeling accuracy of models involving either all features or individual features
Analysis completed using WormBase version WS182

Figure S39: Machine learning procedure for modeling TF binding peaks

Graphics only, no data used

Figure S40: TF binding model accuracy

Analysis completed using WormBase version WS182
Source experimental data:

Figure S41: Average signals of some chromatin marks at the binding- peaks and non-binding-peaks of TFs

Figure S42: Developmental stage-specific models

Analysis completed using WormBase version WS182

Figure S43: Combination of chromatin and sequence features

Analysis completed using WormBase version WS190
Interpreted data:

Figure S44: Coverage of evolutionarily constrained regions by genomic features

Analysis completed using WormBase version WS182

Figure S45: Conservation enrichment analysis

Figure S46: PhastCons score correlation with peak centers in modENCODE peak calls

Figure S47: Saturation of TF binding

Analysis completed using WormBase version WS182

Figure S48: Comparison of coverage between ENCODE pilot and modENCODE C. elegans project

Figure S49: RNAPII ChIP-seq signal aggregation in human and C. elegans

Figure S50: Comparison of histone marks in C. elegans early embryos and human CD4T cells

Supplementary Tables

Supplement Table 1: Data overview for RNA sequencing and expression tiling arrays

Interpreted data:
Source experimental data:

Supplement Table 1b: ChIP-chip, ChIP-seq, and other chromatin-characterization experiments

Interpreted data:
Source experimental data:

Supplement Table 1c: Inferred genomic elements

Interpreted data:
Source experimental data:

Supplement Table 2a: Sources of polyA sites in final integrated transcript set

Interpreted data:
Source experimental data:

Supplement Table 2b: C. elegans genes not identified as “transcribed†in 19 polyA RNA-seq samples

Interpreted data:
Source experimental data:

Supplement Table 3: Developmental stages and tissue samples of small RNA-seq and tiling array experiments

Interpreted data:
Source experimental data:

Supplement Table 4: Summary of cell and stage specific tiling array results

Interpreted data:
Source experimental data:

Supplement Table 5: Different types of known ncRNAs

Interpreted data:
Source experimental data:

Supplement Table 6: Annotated regions used for the training of machine learning methods (21K- set) – tiling array TARs

Interpreted data:
Source experimental data:

Supplement Table 7: Performance of our integrated method (21K-set) on tiling array TARsa with three different ways to define element classes in the gold-standard set

Interpreted data:
Source experimental data:

Supplement Table 8: Annotated and novel tiling array TARs going into 21K-set of ncRNAs

Interpreted data:
Source experimental data:

Supplement Table 9: Co-expression clusters of coding transcripts and novel ncRNA candidates (7K-set)

Interpreted data:
Source experimental data:

Supplement Table 10: Total mapped reads, numbers of peaks bound by each of 23 factors (22 TFs and one dosage compensation factor, 28 experiments in total) from ChIP-seq

Interpreted data:
Corrected Table S10 Source experimental data:

Supplement Table 11: GO analysis of genes associated with HOT regions

Interpreted data:
Source experimental data:

Supplement Table 12: Expression correlation of transcription factors with target genes and non- target genes

Interpreted data:
Source experimental data:

Supplement Table 13: Overview of PicTar-predicted miRNA target sites within 3'UTRs of the aggregated integrated transcript set

Interpreted data:
Source experimental data:

Supplement Table 14: Overlap of 4.1 Mb of residual constrained blocks with various genomic elements

Interpreted data:
Source experimental data:

Supplement Table 15: Sample comparison of C. elegans and human TF binding regions

Interpreted data:
Source experimental data:

Supplement Table 16: Sample comparison of C. elegans and human transcription

Interpreted data:
Source experimental data:

Supplement Table 17: Table of Wormbase versions used

Interpreted data:
Source experimental data:

Supplementary Files

Supplement File 1

Genome positions and predicted scores of predicted long ncRNA TARs (set-21K)

Supplement File 2

(Dataset S#) Coordinates for PicTar-predicted miRNA target sites within 3'UTRs of the ag1003 transcript set

Supplement File 3

Coordinates for HOT regions

Supplement File 4

Predicted ncRNAs, 7K set

Analysis Tools

Companion Papers

  • M. Kato, X. Chen, S. Inukai, H. Zao, F. Slack, Age-associated changes in expression of small, non-coding RNAs, including microRNAs, in C. elegans. Genome Res., (submitted).
  • X. Feng, L. Stein, High resolution peak calling for Chromatin IP Sequencing. (submitted).
  • S. Contrino et al., Accessing modENCODE data. Genome Res., (submitted).
  • H. Lei et al., A widespread distribution of genomic CeMyoD binding sites revealed and cross-validated by ChIP-chip and ChIP-Seq techniques. PLoS One, (submitted).
  • M. Friedlander, S. Mackowiak, N. Rajewsky, miRDeep 2.0. (manuscript in preparation).
  • W. Chung et al., Computational and experimental identification of mirtrons in Drosophila melanogaster and Caenorhabditis elegans. Genome Res., (submitted).
  • B. Ewing, A. Leahey, L. Hillier, C. Davis, P. Green, Targeted closure of the C. elegans transcriptome. (manuscript in preparation).
  • N. L. Washington et al., The modENCODE Data Coordination Center: Lessons in harvesting comprehensive experimental details. Genome Res., (submitted).
  • W. C. Spencer et al., A spatial and temporal map of C. elegans gene expression. (manuscript in preparation).
  • W. Niu et al., Systematic dissection of regulatory networks dictated by C. elegans sequence-specific transcription factors. (manuscript in preparation).
  • W. Niu, ..., V. Reinke, Diverse transcription factor binding features revealed by genome-wide ChIP-Seq in C. elegans. Genome Res., (submitted).
  • Z. J. Lu et al., Prediction and characterization of non-coding RNAs in C. elegans by integrating conservation, secondary structure and high throughput sequencing and array data. Genome Res., (submitted).
  • T. Liu et al., Broad chromosomal domains of histone modification patterns in C. elegans. Genome Res, (submitted).
  • C. H. Jan, R. C. Friedman, J. G. Ruby, C. B. Burge, D. P. Bartel, Formation and regulation of 3´ untranslated regions in Caenorhabditis elegans. (manuscript in preparation).
  • S. Ercan, Y. Lubling, E. Segal, J. D. Lieb, High nucleosome occupancy is encoded at X-linked gene promoters in C. elegans Genome Res, (submitted).
  • M. A. Allen, L. W. Hillier, R. H. Waterston, T. Blumenthal, A global analysis of trans-splicing in C. elegans. Genome Res., (submitted).

All listed URLs