During the first decade of the ENCODE project (2003-2014), UCSC coordinated all project data, hosting genome browser tracks and download files for all Consortium experiments. UCSC also developed tools for locating and accessing ENCODE data as well as outreach and tutorial materials to help the user community. The ENCODE Data at UCSC resources below are those developed during this period. For newer data and outreach materials, consult the ENCODE Project section.
Human: Experiment Matrix, Experiment List, Downloads and Cell Types
Mouse: Experiment Matrix, Experiment List, Downloads and Cell Types
ENCODE Antibodies and ENCODE Registered Experiment Variables
File Formats and Data Standards and Software Tools
UCSC Genome Browser FAQ Page and Searchable Google Groups Mailing List
Search all ENCODE web pages at UCSC: Search the entire UCSC Genome Browser website:
DATA FILE AND TABLE FORMAT QUESTIONS How do I extract information about an ENCODE experiment from the filename? What is the difference between a file xxx and the related file xxxV2? How do I learn more about different ENCODE file formats? What does xxx mean in a file in hgdownload/encodeDCC/hg19/wgEncode(track)? How do I download ENCODE histone data in BED format? How do I find the meaning of a column of a BED file? What is the definition of “score” in ENCODE tables? How are the columns signalValue and peak calculated in narrowPeak files? What does the name column represent for DNase clustered BED files? Can I convert WIG files into a variableStep format to use with SitePro? How do I learn more about peak calling algorithms used to generate narrowPeak and broadPeak files? What program reads “.bb” TFBS files from ENCODE? OTHER QUESTIONS How do I display ENCODE data from GEO in the genome browser? Where can I find ENCODE papers? Is there a service providing ENCODE data on a hard drive? May I use the the ENCODE figure from your homepage? Which cell types are used by ENCODE? Which cell protocols were used in my track of interest? Where can I find the ENCODE growth protocol for a specific cell type? Has transcription factor xxx been mapped by ENCODE? How do I find overlaps between my own ChIP-seq regions and available ENCODE transcription factors? I am making a public hub for my paper, is there an example html file to use for my data description? Questions and feedback welcome.
Take note of the GEO sample accession (GSM) number and enter it into the Track Search tool accessible from the left side of the ENCODE portal page by clicking Search, for example GSM999240. Or use the Advanced Track Search page and select “GEO sample accession” from the pull down menu displaying “Cell, tissue or DNA sample”. Click the box next to your track resulting from the search and the “View in Browser” button.
If you have data that is not already in the browser we recommend converting your BED files to bigBed format. You could download our source tools for converting from BED to bigBed (as described in the previous link) or use the tools at the Galaxy website. For questions regarding Galaxy you will have to contact them directly.
Another path to ENCODE protocols is from the link /ENCODE/protocols/. Navigate to the cell protocols and then human directories to find the link to the same RCC 7860 protocol file as linked on the above Human Cell Types page.
If you have further questions about a protocol contact the lab that registered the protocol.
Another option is to use the Track Search or File Search tools and to search the “Antibody or target protein” field to see if the desired transcription factor is listed.
If you are unfamiliar with the Table Browser, please refer to our help page and the section on intersecting data.
The metadata uses controlled vocabulary (cv.ra), which can be downloaded as a text file here.
To find out exactly how score has been calculated for a specific track, contact the lab that created the data. There are often several links to authors’ labs in the credits section for each track at the bottom of a track’s description page.
chr21 9825311 9827738 . 1000 . 4.51792 256.60845 261.34671 1809
What is the meaning of the information from the fourth field forward?
Here is an example using UDR, once installed, to download all the mouse mm9 ENCODE information:
$ udr rsync -avP hgdownload.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/
Please read more about the new UDR method here. For those not downloading high amounts of data, we highly recommend using rsync. For example:
$ rsync -a -P rsync://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeDir/wgEncodeFile ./
Using rsync has the advantage of starting up where it left off after a failure, when run again.
chr1 3002700 3002800 0.17
However, this WIG file’s BED-like structure is not accepted by SitePro. Is there a way to format the WIG files as variablestep and not BED-like?
For example in the README.txt file displayed at the top of the page in the Caltech RNA-seq directory you can find the following link: “http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeCaltechRnaSeq”
By navigating to the page above, Caltech RNA-seq Downloadable Files, you can scroll to the bottom (or click the “Description” link in the top right corner) and read the track description’s “Methods” section. In the “Data Processing and Analysis” section there is information explaining how the numbers in gene_id, “GM12878-rep1.####” represent de novo identifiers output by Cufflinks software. At the very bottom of the page is a “Credits” section where contacts are listed. You should send remaining process-specific questions about the data you are investigating to the appropriate contact listed.
When using the Table Browser there is a “describe table schema” button that gives information similar to that located in the File Format FAQ, plus the related Track Description.
For example with settings “group: Regulation”, “track: UW Histone”, and “table: wgEncode…PkRep#”, if you click the “describe table schema” button you will find definitions for signalValue and peak. Scrolling down you will find the related Track Description for UW Histone with the explanation for peak calling under “Methods” and the laboratory contact under “Credits”.
By visiting various ENCODE tracks such as HAIB TFBS, SYDH TFBS, or UW Histone you can learn more about the processes each lab used to generate peaks, and pick a method suitable for your data. Since these data were not generated by the UCSC Browser group, questions about the data methods need to be directed to the corresponding lab. Under the “Credits” section you will find a contact for further questions left unanswered by reading the descriptions.
However, I do not have a program that can open this file. What is the program for this file and where can I find it?
Useful tips when writing your track descriptions: It is best to assume a broad audience of students as well as researchers. Spelling out common acronynms, for example, may be useful for those who are new to genomics. The paper’s abstract may be a good start for your track’s “Description” section. Provide as much detail as possible in the “Methods” section. A email address must be prominently displayed for questions relating to the track. | Other Examples:
Here are a few good examples of hub structure and configuration from the ENCODE Analysis hub:
Note: We recommend a minimal number of default visible tracks in your trackDb.txt to quicken hub loading time and to avoid overwhelming users. For more suggestions on hub structure, please see our Public Hub Guidelines wikipage. Also, for help defining unfamiliar terms, you may want to see the Hub Track Database Definition’s table of contents.
Updated 15 August 2014

