modENCODE home
Browse Genomes: C.elegans D.melanogaster mine modENCODE: query!!

A Data Coordinating Center for modENCODE

Lincoln Stein (PI) [@] Ontario Institute for Cancer Research
James Kent (coPI) [@] University of California, Santa Cruz
Suzanna Lewis (coPI) [@] Lawrence Berkeley National Laboratory
Gos Micklem (coPI) [@] University of Cambridge

Project Objectives

This is the data coordinating center (DCC) for the project. Our role is to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion.

Experimental Approaches

We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO. Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD) , human ENCODE, and other sources. Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.

Information Resources to be Generated

  1. A project portal [www.modencode.org].
  2. A project genome browser which shows all data sets in graphical format.
  3. Downloadable bulk data sets.
  4. A data-mining interface for generating ad hoc queries, performing canned queries, and for making custom reports. [http://intermine.modencode.org]

Contact Information

Data Managers
Marc Perry[@] ChIP-chip and ChIP-seqOntario Institute for Cancer Research
Peter Ruzanov[@] ncRNA and DNA replication in D. melanogasterOntario Institute for Cancer Research
Nicole Washington[@] Transcriptome AnalysisLawrence Berkeley National Laboratory
Wiki and Web Site
Sergio Contrino[@] WebsiteUniversity of Cambridge
E. O. Stinson[@] WikiLawrence Berkeley National Laboratory
Kim Rutherford[@] Website and Systems AdministrationUniversity of Cambridge
Peter Ruzanov[@] Genome BrowsersOntario Institute for Cancer Research
Infrastructure and Development
Sergio Contrino[@]modMine integrationUniversity of Cambridge
Angie Hinrichs[@] GBrowse/UCSC Genome Browser integration
Chris Mungal[@] chado database developmentLawrence Berkeley National Laboratory
Kim Rutherford[@] modMine integrationUniversity of Cambridge
E.O. Stinson[@] chado database development, BIR-TAB, data pipelineLawrence Berkeley National Laboratory
Nicole Washington[@] BIR-TAB, data upload pipelineLawrence Berkeley National Laboratory
Zheng Zha[@]MAGE-tab, chado database migrationOntario Institute for Cancer Research

Web site: http://www.modencode.org