Reproducible big data science: A case study in continuous FAIRness

project

We present the work that went in creation of the atlas of putative transcription factor binding sites from the ENCODE DNase I hypersensitive sequencing data in compliance with the 10 simple rules for reproducible computational research as defined by Sandve et al. We describe the approach we have taken, the tools we built to organize and analyze big biomedical data in compliance with the FAIR principles. This work has been conducted by a multi- disciplinary team of scientists in systems biology, genomics and computer science. We strongly believe that PLOS One journal is an appropriate journal for our paper as the conducted study is interdisciplinary and worthy of a broad audience with differing and multidisciplinary expertise

URL(s):

View Analytics View Assessments

Associated Digital Objects (13)

BDBag of DNase-Seq data from the ENCODE project for 27 tissues(D1)

data

A BDBag of tissue-specific DNase-seq data from ENCODE, for hundreds of biosample replicates and 27 t...

dnase encode raw bdbag

Aligned reads of DNase Sequence data from ENCODE project

data

Aligned reads of DNASE-Seq data of 27 tissues from the ENCODE project with two alignment seeds (16 a...

dnase bam

Non-redundant motifs for TFBS inference

data

Database file containing the hits produced

Database of hits

data

Database generated from the hits produced by non-redundant motifs (http://minid.bd2k.org/minid/landi...

Transcription Factor Binding Sites for DNAse data from 27 tissues in ENCODE

data

BDBag of 54 BDBags containing candidate TFBSs , one BDBag per {tissue, seed}. Each BDBag contains tw...

BED files with footprints of ENCODE DNase Sequence data

data

BDBag of 54 BDBags containing footprints computed one per {tissue, seed}. Each BDBag contains two B...

ENCODE2BDBag Service

tool

A Service to create a BDBag for a given ENCODE query or metadata file. The resulting BDBag includes ...

ENCODE2BdBag Tool

tool

Utility for converting ENCODE search URLs or metadata files into BDBags

Galaxy workflow for generating Footprints

tool

A Galaxy workflow for generating transcription factor binding sites from DNAse data from ENCODE

Docker container description for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data

tool

Dockerfile that enables recreation of Docker container with footprinting tools (HINT, Wellington)

Docker Container for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data

tool

Docker Container for analysis tools used in creating the atlas of putative transcription factor bind...