www.epigenomes.ca tracks

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.

Sample metadata is available at: CEMT Samples.

The wet lab protocols used are described in protocols.

The protocol for ChIP-Seq assays was paired end. The raw reads from the sequencing were first split by index and adaptors were trimmed, then fastq files corresponding to the two mate pairs were generated for each index. The reads were aligned to GRCh37-lite reference using Burrows-Wheeler Aligner version 0.5.7 and converted to bam format with SAMtools (version 0.1.13). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).

Compressed wig tracks were generated from bam through in-house tools using ChIP-Seq mode with SAMtools flags "-F 1028 -q 5" and GRCh37-lite chromosome names were changed to UCSC chromosome names. The wig files were converted to bigwigs using UCSC tools.

The command lines used were:

commands
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_1_FASTQ > $MATE_1_SAI
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_2_FASTQ > $MATE_2_SAI
bwa sampe GRCh37-lite.fa $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ \| samtools view -bt GRCh37-lite.fa.fai - \| samtools sort - $SORTED_PREFIX
java -Xms512m -Xmx16g -jar MarkDuplicates.jar I=$ANNOTATED_SORTED_BAM OUTPUT=$DUPLICATES_MARKED METRICS_FILE=$METRICS_FILE ASSUME_SORTED=true READ_NAME_REGEX="[a-zA-Z0-9]+_[0-9]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).*" OPTICAL_DUPLICATE_PIXEL_DISTANCE=14 VALIDATION_STRINGENCY=LENIENT
export BAM2WIG_OPTS="-F 1028 -q 5 -cp -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES" java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY;
$UCSCTOOLS/wigToBigWig -clip $INFILE $CHROMSIZES/hg19.chrom.sizes $OUTPUT_DIRECTORY/$OUTFILE

For ChIP-Seq post processing and peak calling, please refer to here

The various assays have been color coded as follows:

Mark Colour

ChIP Input

H3K27ac

H3K27me3

H3K36me3

H3K4me1

H3K4me3

H3K9me3

Mark	Colour
ChIP Input
H3K27ac
H3K27me3
H3K36me3
H3K4me1
H3K4me3
H3K9me3

Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.