{"id":6302,"projects":[69],"description":"PCAWG GATK Co-cleaning workflow is developed by the Broad Institute\n(https://www.broadinstitute.org), it consists of two pre-processing steps for tumor/normal\nBAM files: indel realignment and base quality score recalibration (BQSR). The workflow\nhas been dockerized and packaged using CWL workflow language, the source code is available on\nGitHub at: https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning.\n\n\n## Run the workflow with your own data\n\n### Prepare compute environment and install software packages\nThe workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware and\nsoftware settings.\n\n#### Hardware requirement (assuming 30X coverage whole genome sequence)\n- CPU core: 16\n- Memory: 64GB\n- Disk space: 1TB\n\n#### Software installation\n- Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation\n- CWL tool\n```\npip install cwltool==1.0.20170217172322\n```\n\n### Prepare input data\n#### Input aligned tumor / normal BAM files\n\nThe workflow uses a pair of aligned BAM files as input, one BAM for tumor, the other for normal,\nboth from the same donor. Here we assume file names are *tumor_sample.bam* and *normal_sample.bam*,\nand are under *bams* subfolder.\n\n#### Reference data files\n\nThe workflow also uses the following files as reference, they can be downloaded from the ICGC Data Portal:\n\n- Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-bwa-mem\n  - genome.fa.gz\n  - genome.dict\n- Under https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-gatk-cocleaning\n  - 1000G_phase1.indels.hg19.sites.fixed.vcf.gz\n  - Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz\n  - dbsnp_132_b37.leftAligned.vcf.gz\n\nWe assume the reference files are under *reference* subfolder.\n\n#### Job JSON file for CWL\n\nFinally, we need to prepare a JSON file with input, reference files specified. Please replace\nthe *tumor_bam* and *normal_bam* parameters with your real BAM files.\n\nName the JSON file: *pcawg-gatk-cocleaning.job.json*\n```\n{\n    \"tumor_bam\": {\n        \"class\": \"File\",\n        \"location\": \"bams/tumor_sample.bam\"\n    },\n    \"normal_bam\": {\n        \"class\": \"File\",\n        \"location\": \"bams/normal_sample.bam\"\n    },\n    \"reference\": {\n        \"class\": \"File\",\n        \"location\": \"reference/genome.fa\"\n    },\n    \"knownIndels\": [\n        {\n            \"class\": \"File\",\n            \"location\": \"reference/1000G_phase1.indels.hg19.sites.fixed.vcf.gz\"\n        },\n        {\n            \"class\": \"File\",\n            \"location\": \"reference/Mills_and_1000G_gold_standard.indels.hg19.sites.fixed.vcf.gz\"\n        }\n    ],\n    \"knownSites\": [\n        {\n            \"class\": \"File\",\n            \"location\": \"reference/dbsnp_132_b37.leftAligned.vcf.gz\"\n        }\n    ]\n}\n```\n\n### Run the workflow\n#### Option 1: Run with CWL tool\n- Download CWL workflow definition files\n```\nwget https://github.com/ICGC-TCGA-PanCancer/pcawg-gatk-cocleaning/archive/0.1.1.tar.gz\ntar xvf pcawg-gatk-cocleaning-0.1.1.tar.gz\n```\n\n- Run `cwltool` to execute the workflow\n```\nnohup cwltool --debug --non-strict pcawg-gatk-cocleaning-0.1.1/gatk-cocleaning-workflow.cwl pcawg-gatk-cocleaning.job.json > pcawg-gatk-cocleaning.log 2>&1 &\n```\n\n#### Option 2: Run with the Dockstore CLI\nSee the *Launch with* section below for details.","image":"","tags":"","type":"","title":"pcawg-gatk-cocleaning","url":"https://dockstore.org/api/api/ga4gh/v2/tools/%23workflow%2Fgithub.com%2FICGC-TCGA-PanCancer%2Fpcawg-gatk-cocleaning","authors":[1],"rubrics":[25]}