modENCODE home
Browse Genomes: C.elegans D.melanogaster mine modENCODE: query!!

A Data Coordinating Center for modENCODE

Lincoln Stein (PI)
James Kent
Suzanna Lewis
Gos Micklem
Cold Spring Harbor Laboratory
University of California, Santa Cruz
Lawrence Berkeley National Laboratory
University of Cambridge

Project Objectives

This is the data coordinating center (DCC) for the project. Our role is to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion.

Experimental Approaches

We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO. Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD) , human ENCODE, and other sources. Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.

Information Resources to be Generated

  1. A project portal [www.modencode.org].
  2. A project genome browser which shows all data sets in graphical format.
  3. Downloadable bulk data sets.
  4. A data-mining interface for generating ad hoc queries, performing canned queries, and for making custom reports. [http://intermine.modencode.org]

Contact Information

  1. PIs
    1. Lincoln Stein (PI)
    2. Suzi Lewis (coPI)
    3. Jim Kent (coPI)
    4. Gos Micklem (coPI)
  2. Informatics
    1. Sheldon McKay (C. elegans)
    2. Zheng Zha (C. elegans)
    3. Chris Mungal (D. melanogaster)
    4. Nicole Washington (D. melanogaster)
  3. Web site: http://www.modencode.org