Canadian Epigenetics, Environment and Health Research Consortium

Description

The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).

The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.

Sample metadata is available at: CEMT Samples.

Methods

The wet lab protocols used are described in protocols.

The protocol for strand specific mRNA-seq assays was paired end. Fastq files corresponding to the two mate pairs were generated. The reads were aligned to a genome + transcriptome reference (see JAGuaR: Repositioning of RNA-seq Reads) using Burrows-Wheeler Aligner version 0.5.7 and SAMtools (version 0.1.13). The resulting bam files were repositioned to GRCh37-lite using JAGuaR (version 2.0.2). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).

Using in-house tools the bam was split by strand of originally sequenced cDNA fragment and gzipped wig files were generated. SAMtools flags "-F 516 -q 0" were used and GRCh37-lite chromosome names were changed to UCSC chromosome names. An in-house RNA QC and Analysis pipeline was used to generate a report containing a normalization constant for computing rpkm values. The constant was inferred from the total number of exonic reads (excluding mitchochondrial reads, reads from ribosomal genes, or reads from highest 0.5% expressed exons). The signal values from the wig files were scaled. The scaled wigs were converted to bigwigs using UCSC tools.

In house analysis is described in RNA-Sequencing section of Gascard et al. "Epigenetic and transcriptional determinants of the human breast".

The command lines used were:

commands
$BWA_PATH/bwa aln -t 16 $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_1_FASTQ > $MATE_1_SAI
$BWA_PATH/bwa aln -t 16 $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_2_FASTQ > $MATE_2_SAI
$BWA_PATH/bwa sampe -s $GENOME_PLUS_JUNCTION_75_BP_REFERENCE $MATE_1_SAI $MATE_2_SAI $MATE_1_FASTQ $MATE_2_FASTQ | $SAMTOOLS_PATH/samtools view -bt $GENOME_PLUS_JUNCTION_75_BP_REFERENCE_INDEX - | $SAMTOOLS_PATH/samtools sort -n - $SORTED_PREFIX
$PYTHON_PATH/python $JAGUAR_PATH/JAGuaR_v2.0.2/convertJunctions.py -f $BAM -i $GENOME_JUNCTION_75_BP_REFERENCE_INDEX_LOCATION -o $OUTDIR -m 37013 --samtools $SAMTOOLS_PATH
$SAMTOOLS_PATH/samtools view -bS $JAGUAR_OUTPUT_NAME_SORTED_SAM > $NAME_SORTED_BAM
$SAMTOLLS_PATH/samtools sort $NAME_SORTED_BAM $POSITION_SORTED_PREFIX
$PICARD_PATH/MarkDuplicates.jar VALIDATION_STRINGENCY=SILENT I=$POSITION_SORTED_BAM O=$DUPS_FLAGGED_BAM M=$METRICS TMP_DIR=$TEMP ASSUME_SORTED=true QUIET=true CREATE_INDEX=false
export BAM2WIG_OPTS="-s -F 516 -q 0 -n $OUTPUT_PREFIX -chr $CHR_BWA2UCSC_NAMES"
java -jar -Xmx10G $BAM2WIG_PATH/BAM2WIG.jar -bamFile $INFILE $BAM2WIG_OPTS -out $OUTPUT_DIRECTORY;
gunzip --stdout $SIGNAL_WIG | awk '{if($0~"^[-0-9]+$") printf $0 * $NORM "\n" ; else printf $0 "\n"}' | gzip > $RPKM_SIGNAL_WIG;
$WIG2BIGWIG_PATH/wigToBigWig -clip $RPKM_SIGNAL_WIG $WIG2BIGWIG_PATH/hg19.chrom.sizes $RPKM_SIGNAL_BW

Display Scheme

The various assays have been color coded as follows:
MarkColour
RNA Seq 

Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.

Note

When analyzing data from different sources, please note that underlying data processing and handling procedures may be different.

Contacts

Please direct any questions to: edcc@bcgsc.ca