www.epigenomes.ca tracks

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.

Sample metadata is available at: CEMT Samples.

The wet lab protocols used are described in protocols.

The protocol for ChIP-Seq assays was paired end. The raw reads from the sequencing were first split by index and adaptors were trimmed. Then fastq files corresponding to the two mate pairs were generated for each index. The reads were aligned to GRCh37-lite reference using Burrows-Wheeler Aligner version 0.5.7 and converted to bam format with SAMtools (version 0.1.13). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tool's MarkDuplicates.jar (version 1.71).

Compressed wig tracks were generated from bam through in-house tools using ChIP-Seq mode with SAMtools flags "-F 1028 -q 5" and GRCh37-lite chromosome were names changed to UCSC chromosome names. The wigs files were converted to bigwigs using UCSC tools.

The command lines used were:

commands
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_1_FASTQ > $MATE_1_SAI
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_2_FASTQ > $MATE_2_SAI
bwa sampe GRCh37-lite.fa $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ \| samtools view -bt GRCh37-lite.fa.fai - \| samtools sort - $SORTED_PREFIX
java -Xms512m -Xmx16g -jar MarkDuplicates.jar I=$ANNOTATED_SORTED_BAM OUTPUT=$DUPLICATES_MARKED METRICS_FILE=$METRICS_FILE ASSUME_SORTED=true READ_NAME_REGEX="[a-zA-Z0-9]+_[0-9]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).*" OPTICAL_DUPLICATE_PIXEL_DISTANCE=14 VALIDATION_STRINGENCY=LENIENT
export BAM2WIG_OPTS="-F 1028 -q 5 -cp -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES" java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY;
$UCSCTOOLS/wigToBigWig -clip $INFILE $CHROMSIZES/hg19.chrom.sizes $OUTPUT_DIRECTORY/$OUTFILE

The various assays have been color coded as follows:

Mark Colour

ChIP Input

H3K27ac

H3K27me3

H3K36me3

H3K4me1

H3K4me3

H3K9me3

mature miRNA

precursor miRNA

miRNA not mature or precursor

RNA Seq

Mark	Colour
ChIP Input
H3K27ac
H3K27me3
H3K36me3
H3K4me1
H3K4me3
H3K9me3
mature miRNA
precursor miRNA
miRNA not mature or precursor
RNA Seq

Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.