pcawg-sanger-cgp-workflow
PCAWG Sanger variant calling workflow is developed by Wellcome Trust Sanger Institute (http://www.sanger.ac.uk/), it consists of software components calling somatic substitutions, indels and structural variants using uniformly aligned tumour / normal WGS sequences. The workflow has been dockerized and packaged using CWL workflow language, the source code is available on GitHub at: https://github.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker. ## Run the workflow with your own data ### Prepare compute environment and install software packages The workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware and software settings. #### Hardware requirement (assuming X30 coverage whole genome sequence) - CPU core: 16 - Memory: 64GB - Disk space: 1TB #### Software installation - Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation - CWL tool ``` pip install cwltool==1.0.20170217172322 ``` ### Prepare input data #### Input aligned tumor / normal BAM files The workflow uses a pair of aligned BAM files as input, one BAM for tumor, the other for normal, both from the same donor. Here we assume file names are *tumor_sample.bam* and *normal_sample.bam*, and both files are under *bams* subfolder. #### Reference data files The workflow also uses two precompiled reference files (*GRCh37d5_CGP_refBundle.tar.gz*, *GRCh37d5_battenberg.tar.gz*) as input, they can be downloaded from the ICGC Data Portal under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-sanger. We assume the two reference files are downloaded and put under *reference* subfolder. #### Job JSON file for CWL Finally, we need to prepare a JSON file with input, reference and output files specified. Please replace the *tumor* and *normal* parameters with your real BAM file names. Parameters for output are file name suffixes, usually don't need to be changed. Name the JSON file: *pcawg-sanger-variant-caller.job.json* ``` { "tumor": { "path":"bams/tumor_sample.bam", "class":"File" }, "normal": { "path":"bams/normal_sample.bam", "class":"File" }, "refFrom": { "path":"reference/GRCh37d5_CGP_refBundle.tar.gz", "class":"File" }, "bbFrom": { "path":"reference/GRCh37d5_battenberg.tar.gz", "class":"File" }, "somatic_snv_mnv_tar_gz": { "path":"somatic_snv_mnv_tar_gz", "class":"File" }, "somatic_cnv_tar_gz": { "path":"somatic_cnv_tar_gz", "class":"File" }, "somatic_sv_tar_gz": { "path":"somatic_sv_tar_gz", "class":"File" }, "somatic_indel_tar_gz": { "path":"somatic_indel_tar_gz", "class":"File" }, "somatic_imputeCounts_tar_gz": { "path":"somatic_imputeCounts_tar_gz", "class":"File" }, "somatic_genotype_tar_gz": { "path":"somatic_genotype_tar_gz", "class":"File" }, "somatic_verifyBamId_tar_gz": { "path":"somatic_verifyBamId_tar_gz", "class":"File" } } ``` ### Run the workflow #### Option 1: Run with CWL tool - Download CWL workflow definition file ``` wget -O pcawg-sanger-variant-caller.cwl "https://raw.githubusercontent.com/ICGC-TCGA-PanCancer/CGP-Somatic-Docker/2.0.3/Dockstore.cwl" ``` - Run `cwltool` to execute the workflow ``` nohup cwltool --debug --non-strict pcawg-sanger-variant-caller.cwl pcawg-sanger-variant-caller.job.json > pcawg-sanger-variant-caller.log 2>&1 & ``` #### Option 2: Run with the Dockstore CLI See the *Launch with* section below for details.
URL(s):
View Assessments