Canadian Epigenetics, Environment and Health Research Consortium

Description

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.

Sample metadata is available at: CEMT Samples.

Methods

The wet lab protocols used are described in protocols.

The protocol for ChIP-Seq assays was paired end. The raw reads from the sequencing were first split by index and adaptors were trimmed, then fastq files corresponding to the two mate pairs were generated for each index. The reads were aligned to GRCh37-lite reference using Burrows-Wheeler Aligner version 0.5.7 and converted to bam format with SAMtools (version 0.1.13). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).

Compressed wig tracks were generated from bam through in-house tools using ChIP-Seq mode with SAMtools flags "-F 1028 -q 5" and GRCh37-lite chromosome names were changed to UCSC chromosome names. The wig files were converted to bigwigs using UCSC tools.

The command lines used were:

commands
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_1_FASTQ > $MATE_1_SAI
$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_2_FASTQ > $MATE_2_SAI
bwa sampe GRCh37-lite.fa $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ | samtools view -bt GRCh37-lite.fa.fai - | samtools sort - $SORTED_PREFIX
java -Xms512m -Xmx16g -jar MarkDuplicates.jar I=$ANNOTATED_SORTED_BAM OUTPUT=$DUPLICATES_MARKED METRICS_FILE=$METRICS_FILE ASSUME_SORTED=true READ_NAME_REGEX="[a-zA-Z0-9]+_[0-9]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).*" OPTICAL_DUPLICATE_PIXEL_DISTANCE=14 VALIDATION_STRINGENCY=LENIENT
export BAM2WIG_OPTS="-F 1028 -q 5 -cp -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES"
java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY;
$UCSCTOOLS/wigToBigWig -clip $INFILE $CHROMSIZES/hg19.chrom.sizes $OUTPUT_DIRECTORY/$OUTFILE

Post processing

For ChIP-Seq post processing and peak calling, please refer to here

Display Scheme

The various assays have been color coded as follows:
MarkColour
ChIP Input  
H3K27ac  
H3K27me3 
H3K36me3 
H3K4me1  
H3K4me3  
H3K9me3  

Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.

Note

When analyzing data from different sources, please note that underlying data processing and handling procedures may be different.

Contacts

Please direct any questions to: edcc@bcgsc.ca