A Data Coordinating Center for modENCODE
| Lincoln Stein
| (PI)
| [@]
| Ontario Institute for Cancer Research
|
|---|
| James Kent
| (coPI)
| [@]
| University of California, Santa Cruz
|
|---|
| Suzanna Lewis
| (coPI)
| [@]
| Lawrence Berkeley National Laboratory
|
|---|
| Gos Micklem
| (coPI)
| [@]
| University of Cambridge
|
|---|
Project Objectives
This is the data coordinating center (DCC) for the project. Our role is to track the data, integrate it with other information sources, and make it available to the research community in a timely and open fashion.
Experimental Approaches
We will assemble a team of three data managers stationed at CSHL and at Berkeley, who have a background in the bioinformatics of C. elegans and/or D. melanogaster. The managers will liaise with their contacts at the data provider sites to determine data file formats, milestones and quality control procedures for their datasets. They will also liaise with representatives from NCBI to coordinate modENCODE activities with the primary data repositories at GenBank and GEO.
Data providers will upload their data sets to a staging server where they will be able to preview their data on an instance of the GBrowse genome browser. The data managers will QC the data before approving its transfer to the production database. Data will be integrated in the production database using InterMine, and from there released to the public on a monthly schedule. Researchers will be able to access the data via the GBrowse genome browser, bulk downloads, and via complex queries and reports mediated by InterMine and the BioMart data warehousing system. All major software systems used by the proposed DCC will be based on open source tools from the Generic Model Organism Database (GMOD) , human ENCODE, and other sources.
Throughout the project, Lewis and Stein will work close with FlyBase and/or WormBase to ensure that data collected by modENCODE becomes an integral part of the relevant model organism database. In addition we will dedicate a significant part of a data manager's effort to transfer data from modENCODE into the MODs during the last year of the project.
Information Resources to be Generated
- A project portal [www.modencode.org].
- A project genome browser which shows all data sets in graphical format.
- Downloadable bulk data sets.
- A data-mining interface for generating ad hoc queries, performing canned queries, and for making custom reports.
[http://intermine.modencode.org]
Contact Information
| Data Managers
|
|---|
| Marc Perry | [@] | ChIP-chip and ChIP-seq | Ontario Institute for Cancer Research
|
| Peter Ruzanov | [@] | ncRNA and DNA replication in D. melanogaster | Ontario Institute for Cancer Research
|
| Nicole Washington | [@] | Transcriptome Analysis | Lawrence Berkeley National Laboratory
|
| Wiki and Web Site
|
|---|
| Sergio Contrino | [@] | Website | University of Cambridge
|
| E. O. Stinson | [@] | Wiki | Lawrence Berkeley National Laboratory
|
| Kim Rutherford | [@] | Website and Systems Administration | University of Cambridge
|
| Peter Ruzanov | [@] | Genome Browsers | Ontario Institute for Cancer Research
|
| Infrastructure and Development
|
|---|
| Sergio Contrino | [@] | modMine integration | University of Cambridge
|
| Angie Hinrichs | [@] | GBrowse/UCSC Genome Browser integration
|
| Chris Mungal | [@] | chado database development | Lawrence Berkeley National Laboratory
|
| Kim Rutherford | [@] | modMine integration | University of Cambridge
|
| E.O. Stinson | [@] | chado database development, BIR-TAB, data pipeline | Lawrence Berkeley National Laboratory
|
| Nicole Washington | [@] | BIR-TAB, data upload pipeline | Lawrence Berkeley National Laboratory
|
| Zheng Zha | [@] | MAGE-tab, chado database migration | Ontario Institute for Cancer Research
|
Web site: http://www.modencode.org