Reproducible big data science: A case study in continuous FAIRness

project

We present the work that went in creation of the atlas of putative transcription factor binding sites from the ENCODE DNase I hypersensitive sequencing data in compliance with the 10 simple rules for reproducible computational research as defined by Sandve et al. We describe the approach we have taken, the tools we built to organize and analyze big biomedical data in compliance with the FAIR principles. This work has been conducted by a multi- disciplinary team of scientists in systems biology, genomics and computer science. We strongly believe that PLOS One journal is an appropriate journal for our paper as the conducted study is interdisciplinary and worthy of a broad audience with differing and multidisciplinary expertise

URL(s):

View Associations View Analytics

Project Assessments (13)


Assessment Metrics Date
Target Rubric   Globally unique identifier Persistent identifier Machine-readable metadata Standardized metadata Resource identifier in metadata Resource discovery through web search Open, Free, Standardized Access protocol Protocol to access restricted content Persistence of resource and metadata Resource uses formal language FAIR vocabulary Linked Digital resource license Metadata license Provenance scheme Certificate of compliance to community standard
BDBag of DNase-Seq data from the ENCODE project for 27 tissues(D1) FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 4, 2019
Aligned reads of DNase Sequence data from ENCODE project FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) Jan 4, 2019
Non-redundant motifs for TFBS inference FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 7, 2019
Database of hits FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
Transcription Factor Binding Sites for DNAse data from 27 tissues in ENCODE FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) Jan 11, 2019
BED files with footprints of ENCODE DNase Sequence data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
ENCODE2BDBag Service FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
ENCODE2BdBag Tool FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
Galaxy workflow for generating Footprints FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
Docker container description for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
Docker Container for analysis tools used in creating the atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
R Script that is used to generate hits from Motifs database FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019
R Script for generating Transcription Factor Binding Sites FAIR metrics by fairmetrics.org
yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) yes (1.00) no (0.00) Jan 11, 2019