The National Human Genome Research Institute model organism ENCyclopedia Of DNA Elements

The modENCODE Project will try to identify all of the sequence-based functional elements in the Caenorhabditis elegans and Drosophila melanogaster genomes.

Frequently Asked Questions

Welcome to the modENCODE FAQ page. This is a collection of question compiled from the help requests sent to help@modencode.org. If you can't find the information you're interested in, just drop us an email.


Data Access

How do I access modENCODE data?

There are a number of different ways to do this. Here are 5 ways to do it:

1. The faceted data browser: This is a nice user interface for browsing the available datasets. There are a range of filters available for selecting your datasets of interest, and there are links for viewing the data in GBrowse, modMine, and links for downloading it.

2. Browsing by category: This feature exists both on the modENCODE webpage and in modMine. The datasets are sorted into broad categories – for example ‘Chromatin structure’, and each category has a number of studies associated with it – for example ‘Nucleosome mapping’ and ‘Genome-wide chromatin profiling’. Clicking on a study takes you to a list of all the data submissions associated with it.

3. Keyword search: This feature is available on the front page of modMine, as well as in the top right corner of every modMine page. You can search for your favourite gene, experiment type, PI, etc.

4. FTP download: For easy bulk download of modENCODE data, you can use the FTP interface.

5. On the Amazon cloud: To save you downloading data to a local machine, all of it is available on the Amazon cloud. You can upload your own data too, and do your analysis there. To get started, check out this modENCODE help page.

What is the difference between data at http://data.modencode.org and data in modMine?

Both the interfaces link to exactly the same data, but http://data.modencode.org provides a friendlier user interface, so may be easier to use for finding specific datasets. The two sources should lead to identical submissions, but very occasionally there is a mismatch, if a dataset has been retracted in a new release of modMine, but the data browser has not been updated with the new information. In these cases, modMine is the definitive source of information on what data exists. We're currently working on improving the modMine interface, and synchronising the two. If you have problems or come across errors, please let us know!

What is the replicate and experiment/input structure for the ChIP experiments?

On a submission by submission base there is a table in the submission page that should help. This is the blue metadata table in the methods section, see for example http://intermine.modencode.org/query/portal.do?class=Submission&externalids=E20_24_H3K27me3+ChIP-chip

On the experiment level, if you go to an experiment page you should be able to see the different conditions applied in the different submissions. An example would be http://intermine.modencode.org/query/experiment.do?experiment=Chromatin%20Binding%20Site%20Mapping

Data visualisation and interpretation

How are the 'peaks' calculated for ChIP experiments?

Different labs use different peak calling methods - the specifics of the peak callers used for particular datasets can be found on the individual submission report pages. Very broadly, each peak caller will look at the signal from the IP-ed sample compared to the control signal, and use an algorithm to identify regions where the sample signal is considered to be 'significantly' enriched, rather than just thought to be caused by experimental or technical noise in that area.

How significant are the 'peaks' in ChIP experiments?

A common measure of significance shown for peaks from ChIP experiments is a False Discovery Rate (FDR) value. At FDR 1%, for example, it is expected that 1% of all the peaks found for the experiment will be false positives. The FDR value for the peak files associated with individual submissions can be found in the Methods section of the submission report. Some peak callers also show a p-value.

How can I extract genome sequences upstream and/or downstream of particular features?

One way of doing this is to use "samtools faidx" command: samtools faidx ref.fasta. For more information on how to use samtools, see http://samtools.sourceforge.net/samtools.shtml

How do I export data to the UCSC browser?

Each submission has a gff3 file which is compatible with the UCSC browser. To view it there, you need to download the gff3 file, then manually upload it to UCSC.

Experiment information

Which antibodies were used for different experiments?

In general, antibody information is available in modMine, in the submission reports associated with different submissions. Following the links from there will give you more information about the antibodies used for specific experiments.

For histone modifications, there is also a publication assessing the quality of the available antibodies - http://www.nature.com/nsmb/journal/v18/n1/full/nsmb.1972.html

How do I obtain an aliquot of the antibodies used?

At the moment there is no centralised repository for the antibodies used by the modENCODE project. The way to get a hold of an aliquot is to get in touch with the group that used it.

Using modMine

How do I cite modMine?

The reference to use is "modMine: flexible access to modENCODE data. Nucleic Acids Res. 2012 January; 40(D1): D1082–D1088".

Can modMine analyse my raw array / sequencing data for me?

modMine currently does not offer tools for raw data analysis. However, there are a wealth of tools out there available for doing this. If you're looking for a starting point to find out more about the analysis methods used for specific modENCODE datasets, you can have a look at the individual submissions within modMine - the tools used for analysis, along with the experimental protocols, are documented in the 'Methods' section on the Submission Report page.

How do I find out the genes associated with particular genome-wide binding datasets?

On the submission report page for each submission, in the Results section, there is a tool for finding the genes associated with binding intervals. You can select either the default, which finds all the genes directly overlapping the binding intervals, or you can specify a search range (e.g. direct intervals + a space up to 2 kb upsteam of each gene). The query result will give you a list of genes, which you can then save as a list and analyse further. At present, this has to be done on a submission-by-submission basis - there is no tool for doing it in bulk.

I have a link to a webpage in an outdated release of modMine. How do I access the same page in the current release?

If you change the 'release-xx' part of the link to 'query', you will automatically be taken to the newest modMine release.

Using GBrowse

What is the best way to store and visualize Drosophila genome annotations around my genomic locations of interest?

Upload your data to GBrowse (gff would be the format of choice) and create bookmarks for the locations which interest you. You may also download FlyBase annotations directly from GBrowse (using little floppy icon in the left upper corner of FlyBase genes track). Creating bookmarks should be done via File->Bookmark this available in the top-level menu in GBrowse. All uploaded tracks will be stored in your account.

When viewing data, what is the SD (z-score) view? Can you change the scaling of it?

When you look at the data with the default settings (using the SD (z-score) view), there is an average read count calculated for a certain range and that this average is set to 0. Reads significantly enriched in the ChIP samples are plotted as standard deviations greater than 0 (the average line), and those enriched in input are standard deviations plotted below 0. The mean and standard deviation are calculated from the data set as a whole; that is, the entire genome. There are also options to change scaling to just the chromosome, or just the window that you are viewing.

Further resources

How can I get more help / training?

If you have a specific question that's not addressed here, feel free to email us at help@modencode.org

We periodically run webinars, training people up on different aspects of the modENCODE project and the associated data tools. Fur a list of the upcoming webinars, check out ...

For modMine tools, there is a quick help page.

Questions about specific submissions

What is the difference between the CTCF_N and CTCF_C datasets?

One of these uses an antibody for the N terminus of CTCF, while the other one uses an antibody for the C terminus.

Which RNA PolII antibodies were used in different projects?

Gary Karpen used these ones:

with the corresponding links to the antibody information:

Kevin White used:

which leads to: http://wiki.modencode.org/project/index.php?title=Ab:PolII:KW:1&oldid=21956

Where can I find a file with the exons transcribed in each developmental state in the worm?

For worm you can look at this experiment: http://intermine.modencode.org/query/experiment.do?experiment=Reanalysis%20of%20comprehensive%20set%20of%20C.%20elegans%20transcripts%20and%20expression%20for%20various%20stages%20and%20conditions There you can download the exons locations for the various developmental stages.

How do I order clones that contain the coding sequence for a gene included in the MIP cDNA library?

Clones of the MIP cDNA library can be ordered from the Drosophila Genomics Resource Center - I think this is the link to the clone you want to order: https://dgrc.cgb.indiana.edu/product/View?product=1390160

More details about the original cDNA template libraries are available from the Berkeley Drosophila Genome Project page here: http://www.fruitfly.org/EST/SLIPscreen.html