Reproducible big data science: A case study in continuous FAIRness

project

We present the work that went in creation of the atlas of putative transcription factor binding sites from the ENCODE DNase I hypersensitive sequencing data in compliance with the 10 simple rules for reproducible computational research as defined by Sandve et al. We describe the approach we have taken, the tools we built to organize and analyze big biomedical data in compliance with the FAIR principles. This work has been conducted by a multi- disciplinary team of scientists in systems biology, genomics and computer science. We strongly believe that PLOS One journal is an appropriate journal for our paper as the conducted study is interdisciplinary and worthy of a broad audience with differing and multidisciplinary expertise

URL(s):

View Associations View Analytics

Project Assessments (13)


Assessment Metrics
Target Rubric   Globally unique identifier Persistent identifier Machine-readable metadata Standardized metadata Resource identifier in metadata Resource discovery through web search Open, Free, Standardized Access protocol Protocol to access restricted content Persistence of resource and metadata Resource uses formal language FAIR vocabulary Linked Digital resource license Metadata license Provenance scheme Certificate of compliance to community standard
BDBag of DNase-Seq data from the ENCODE project for 27 tissues(D1) FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Aligned reads of DNase Sequence data from ENCODE project FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00)
Non-redundant motifs for TFBS inference FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Database of hits FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Transcription Factor Binding Sites for DNAse data from 27 tissues in ENCODE FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00)
BED files with footprints of ENCODE DNase Sequence data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
ENCODE2BDBag Service FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
ENCODE2BdBag Tool FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Galaxy workflow for generating Footprints FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Docker container description for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
Docker Container for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
R Script that is used to generate hits from Motifs database FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)
R Script for generating Transcription Factor Binding Sites FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00)