The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).
The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.
Sample metadata is available at: CEMT Samples.
The wet lab protocols used are described in protocols.
The protocol for ChIP-Seq assays was paired end. The raw reads from the sequencing were first split by index and adaptors were trimmed, then fastq files corresponding to the two mate pairs were generated for each index. The reads were aligned to GRCh37-lite reference using Burrows-Wheeler Aligner version 0.5.7 and converted to bam format with SAMtools (version 0.1.13). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).
Compressed wig tracks were generated from bam through in-house tools using ChIP-Seq mode with SAMtools flags "-F 1028 -q 5" and GRCh37-lite chromosome names were changed to UCSC chromosome names. The wig files were converted to bigwigs using UCSC tools.
The command lines used were:
|$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_1_FASTQ > $MATE_1_SAI|
|$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $MATE_2_FASTQ > $MATE_2_SAI|
|bwa sampe GRCh37-lite.fa $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ | samtools view -bt GRCh37-lite.fa.fai - | samtools sort - $SORTED_PREFIX|
|java -Xms512m -Xmx16g -jar MarkDuplicates.jar I=$ANNOTATED_SORTED_BAM OUTPUT=$DUPLICATES_MARKED METRICS_FILE=$METRICS_FILE ASSUME_SORTED=true READ_NAME_REGEX="[a-zA-Z0-9]+_[0-9]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).*" OPTICAL_DUPLICATE_PIXEL_DISTANCE=14 VALIDATION_STRINGENCY=LENIENT|
|export BAM2WIG_OPTS="-F 1028 -q 5 -cp -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES"|
java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY;
|$UCSCTOOLS/wigToBigWig -clip $INFILE $CHROMSIZES/hg19.chrom.sizes $OUTPUT_DIRECTORY/$OUTFILE|
For ChIP-Seq post processing and peak calling, please refer to here
The various assays have been color coded as follows:
Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.
When analyzing data from different sources, please note that underlying data processing and handling procedures may be different.
Please direct any questions to: firstname.lastname@example.org