The tracks available in this set have been generated by the Centre for Epigenome Mapping Technology (CEMT) at Canada's Michael Smith Genome Sciences Centre (BCGSC) as a part of the contribution of Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC) to the International Human Epigenome Consortium (IHEC).
The data tracks represent raw signal generated from aligned reads. Access to the raw data underlying the tracks is controlled via European Bioinformatics Institute. The data will be submitted to CEMT Reference Epigenomes (Study: EGAS00001000552) as it becomes available. More information about the project is available at www.epigenomes.ca.
Sample metadata is available at: CEMT Samples.
The wet lab protocols used are described in protocols.
The protocol for miRNA-Seq assays was single end. The raw sequence reads were split by index and adaptors were trimmed. Then fastq files were generated for each index. The reads were aligned to GRCh37-lite reference using Burrows-Wheeler Aligner version 0.5.7 and converted to bam format with SAMtools (version 0.1.13). The bams were annotated using in-house tools (including flagging of chastity failed reads) and the duplicates were marked using Picard Tools' MarkDuplicates.jar (version 1.71).
The resulting bam files were analyzed using a miRNA pipeline that consists of: BCGSC's in-house miRNA profiling (version 0.2.6), SAMtools (version 0.1.7a) and miRBase (version 19). miRDeep was used for novel miRNA prediction. The isoform bed files from the pipeline were split into mature, precursor and everthing not classified as mature or precusor ("miRNA not mature or precursor") and saved into separate bedgraphs with the reads_per_million_miRNA_mapped column used as the signal value. The bedgraphs were converted to bigwigs using UCSC tools.
The command lines used were:
|$BWA_PATH/bwa aln -t 16 GRCh37-lite.fa $FASTQ > $SAI|
|$BWA_PATH/bwa samse -n 10 GRCh37-lite.fa $SAI $FASTQ | $SAMTOOLS_PATH/samtools view -bt GRCh37-lite.fa.fai - | $SAMTOOLS_PATH/samtools sort - $SORTED_PREFIX|
|perl code/annotation/annotate.pl -m mirna_19 -o hsa -u hg19 -p
-m refers to mirbase version
-o refers to mirbase species code
-u refers to UCSC database
|perl code/library_stats/alignment_stats.pl -p data_directory|
|perl code/library_stats/graph_libs.pl -p data_directory|
|perl code/library_stats/expression_matrix.pl -m mirna_19 -o hsa -p data_directory|
mapper.pl $samplelist -d -b -h -j -k $ADAPTER -l $MIN_READ_LENGTH -m -p $GENOME_DATABASE -s $outname\_read_collapsed.fa -t $outname\_hg19a.arf && \
miRDeep2.pl $outname\_read_collapsed.fa $GENOME_FASTA $outname\_hg19a.arf $MATURE_MIRNA_HSA $MATURE_MIRNA_CLOSE_SPP $PRECURSORS_HSA -t Human 2>$outname.mirdeep2_report.log
|$UCSC_PATH/bedGraphToBigWig $BEDGRAPH $UCSC_PATH/hg19.chrom.sizes $BIGWIG|
The various assays have been color coded as follows:
|miRNA not mature or precursor|
Where possible information about the tracks and data is included in the title. However, due to constraints on the length of these fields, this is sometimes not possible. The library information for each track is still included as the first field in the colon delimited title string. Please refer to the metadata table included on this page to look up further details by the library.
When analyzing data from different sources, please note that underlying data processing and handling procedures may be different.
Please direct any questions to: firstname.lastname@example.org