{"id":6296,"projects":[69],"description":"PCAWG uniform alignment workflow uses the popular short read aligner tool BWA MEM (https://github.com/lh3/bwa)\nwith BioBAMBAM (https://github.com/gt1/biobambam) for BAM sorting, merging and marking duplicate.\nThe alignment workflow has been dockerized and packaged using CWL workflow language, the source code\nis available on GitHub at: https://github.com/ICGC-TCGA-PanCancer/Seqware-BWA-Workflow.\n\n## Run the workflow with your own data\n### Prepare compute environment and install software packages\nThe workflow has been tested in Ubuntu 16.04 Linux environment with the following hardware\nand software settings.\n\n#### Hardware requirement (assuming 30X coverage whole genome sequence)\n- CPU core: 16\n- Memory: 64GB\n- Disk space: 1TB\n\n#### Software installation\n- Docker (1.12.6): follow instructions to install Docker https://docs.docker.com/engine/installation\n- CWL tool\n```\npip install cwltool==1.0.20170217172322\n```\n\n### Prepare input data\n#### Input unaligned BAM files\n\nThe workflow uses lane-level unaligned BAM files as input, one BAM per lane (aka read group).\nPlease ensure *@RG* field is populated properly in the BAM header, the following is a\nvalid *@RG* entry. *ID* field has to be unique among your dataset.\n```\n@RG\tID:WTSI:9399_7\tCN:WTSI\tPL:ILLUMINA\tPM:Illumina HiSeq 2000\tLB:WGS:WTSI:28085\tPI:453\tSM:f393ba16-9361-5df4-e040-11ac0d4844e8\tPU:WTSI:9399_7\tDT:2013-03-18T00:00:00+00:00\n```\nMultiple unaligned BAMs from the same sample (with same *SM* value) should be run together. *SM* is\nglobally unique UUID for the sample. Put the input BAM files in a subfolder. In this example,\nwe have two BAMs in a folder named *bams*.\n\n\n#### Reference genome sequence files\n\nThe reference genome files can be downloaded from the ICGC Data Portal at\nunder https://dcc.icgc.org/releases/PCAWG/reference_data/pcawg-bwa-mem. Please download all\nreference files and put them under a subfolder called *reference*.\n\n#### Job JSON file for CWL\n\nFinally, we need to prepare a JSON file with input, reference and output files specified. Please\nreplace the *reads* parameter with your real BAM file name.\n\nName the JSON file: *pcawg-bwa-mem-aligner.job.json*\n```\n{\n  \"reads\": [\n    {\n      \"path\":\"bams/seq_from_normal_sample_A.lane_1.bam\",\n      \"class\":\"File\"\n    },\n    {\n      \"path\":\"bams/seq_from_normal_sample_A.lane_2.bam\",\n      \"class\":\"File\"\n    }\n  ],\n  \"output_dir\": \"datastore\",\n  \"output_file_basename\": \"seq_from_normal_sample_A\",\n  \"reference_gz_amb\": {\n    \"path\": \"reference/genome.fa.gz.64.amb\",\n    \"class\": \"File\"\n  },\n  \"reference_gz_sa\": {\n    \"path\": \"reference/genome.fa.gz.64.sa\",\n    \"class\": \"File\"\n  },\n  \"reference_gz_pac\": {\n    \"path\": \"reference/genome.fa.gz.64.pac\",\n    \"class\": \"File\"\n  },\n  \"reference_gz_ann\": {\n    \"path\": \"reference/genome.fa.gz.64.ann\",\n    \"class\": \"File\"\n  },\n  \"reference_gz_bwt\": {\n    \"path\": \"reference/genome.fa.gz.64.bwt\",\n    \"class\": \"File\"\n  },\n  \"reference_gz_fai\": {\n    \"path\": \"reference/genome.fa.gz.fai\",\n    \"class\": \"File\"\n  },\n  \"reference_gz\": {\n    \"path\": \"reference/genome.fa.gz\",\n    \"class\": \"File\"\n  }\n}\n```\n\n### Run the workflow\n#### Option 1: Run with CWL tool\n- Download CWL workflow definition file\n```\nwget -O pcawg-bwa-mem-aligner.cwl \"https://raw.githubusercontent.com/ICGC-TCGA-PanCancer/Seqware-BWA-Workflow/2.6.8_1.3/Dockstore.cwl\"\n```\n\n- Run *cwltool* to execute the workflow\n```\nnohup cwltool --debug --non-strict pcawg-bwa-mem-aligner.cwl pcawg-bwa-mem-aligner.job.json > pcawg-bwa-mem-aligner.log 2>&1 &\n```\n\n#### Option 2: Run with the Dockstore CLI\nSee the *Launch with* section below for details","image":"","tags":"","type":"","title":"pcawg-bwa-mem-workflow","url":"https://dockstore.org/api/api/ga4gh/v2/tools/quay.io%2Fpancancer%2Fpcawg-bwa-mem-workflow","authors":[1],"rubrics":[25]}