Advanced guide to using data

All CEMT genotype data (aligned bam files, variant calls, and underlying raw fastqs) are available for download via European Genome Archive study EGAS00001000552. The data is subject to protected access and embargoed for a period of nine months after submission to the archive. Please consult the data accesss policy for details.

For working with processed tracks, it's recommended that for the purposes of off line processing the following resource be used for accessing data with annotated metadata attributes: json data hub. Please consult the IHEC github repository for a current description of the json specification, in particular: description, as well as the current official metadata recommendations from the IHEC metadata working group are tracked at: Google Documents.

All files published by CEMT follow the convention that the first prefix in the filename corresponds to the library name which is also the unique experiment identifier used in the json hub. In general, while a certain amount of details can be extracted from the filename, refering to the json document is ideal. The methods page describes the file types and workflows used in generating file.

In case any clarification is required, please email edcc@bcgsc.ca.