DATA MANAGEMENT PLAN Data Policy Compliance: This project will comply with the NSF OCE Data and Sample Policy. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) will be the primary data repository. The genomic information will be archived with NCBI and linked with our data at BCO-DMO.
Nature of Data and Collections: Multiple data types will be generated in this project including: Amplicon, genome, methylome, and transcriptome data, mass spectrometry and histone post-translational modifications, stored as raw data (long-term storage), as processed quality controlled data, and as assembled contigs for analysis. Experimental physical and biological data (e.g., temperature, pH, total alkalinity, water flow, and physiological measures and metadata) stored as jpg, csv, and R scripts. Analyses of the above datasets will require statistical code (e.g., using the programs R or MATLAB). After manuscripts have been accepted for publication, code will be made available either as a supplement, or by deposition on the first author’s GitHub page and BCO-DMO. Formulation of DEB, population, community, and eco-evo models will generate scripts (e.g., in the programs R, MATLAB, and Python) as well as model output (saved in non-proprietary formats such as .csv and .txt). After manuscripts have been accepted for publication, code and outputs will be made available either as a supplement, or by deposition on the first author’s GitHub page and BCO-DMO.
Sample and Active Data Storage: All samples will be collected under appropriate permits and protocols through the government of French Polynesia and UC Berkeley Gump Station. Genomic DNA, RNA (cDNA) and protein samples will be retained and archived in the Putnam, Roberts, Cunning, and Eirin-Lopez laboratories, stored at -80° C for long-term storage, when not fully consumed in analyses. Upon acquisition from the sequencing or analytical instruments and facilities, data will be quality controlled and added to project servers at URI, UW, and FIU before being submitted to the appropriate NCBI and BCO-DMO repositories, and will be platformed on our project E5 website for replicated storage with backup. All data on the project will be accessible by all persons involved in the project. Data will be backed up, specifically the original and one copy will be stored on the hard drive of two desktop personal computers and/or local servers, whereas an additional copy will be used to share research data with the global scholarly community and will be used for public access. Data archiving services are available through FIU and UW Cores. Finally, the PI team will annually review the practices with team members and the External Advisory Board to ensure compliance with this Data Management Plan.
Data Archival: Data will be archived permanently in the original data format and also in more common, non-proprietary formats (e.g., tiff, csv, txt, fasta, etc...) to facilitate future data usage. Data generated by the research and related metadata will be deposited in and be accessible through NCBI (raw data via the NCBI Sequence Read Archive [SRA], assembled transcriptome contigs via the NCBI Transcriptome Shotgun Assembly archive [TSA]), as well as to the iMicrobe project and through a link on the project web site hosted at each University. Physical, physiological, and modeling data will be archived at BCO-DMO. The project website will centralize all resources and will also provide access to all downstream analyses such as gene annotations (e.g., Blast2GO, NCBI) and output of modeling. Data and metadata associated with chromatin structure and histone PTMs resulting from mass spectrometry and antibody-based techniques will be classified and archived and integrated with genomic data resulting from ATAC-seq. In both cases, these will be assembled and subsequently classified using gene ontology (GO) analyses using the corresponding reference genomes for coral species as indicated in the proposal. Within one month of acquiring the raw genomic data (RNA-Seq, ATAC-Seq, BS-seq) it will be uploaded into the Short Read Archive at NCBI and linked to BCO-DMO as data are accessible there. SQL databases will be used to organize the phenotyping and ocean chemistry data, which will also be deposited to BCO-DMO. For each publication, raw data and reproducible pipelines (including scripts used to create publication figures) will be compressed and archived permanently at BCO-DMO. Within two years of data collection, the primary data will be made publically available on BCO-DMO following the Division of Ocean Sciences Sample and Data Policy.
Documentation and Metadata: Documentation for this project will include the formation of written methodologies for sample collection and processing, made openly available on the project E5 website (designed by a project-supported web developer), and published as peer-reviewed or online repository methodologies where applicable (e.g., Molecular Ecology Resources, Protocols.io, GitHub). Quality control will be conducted at each stage of the data acquisition, processing and analyses, including the development of metadata forms detailing the outline of the project, instrumentation used, the format of data, QA/QC standards and controls, and funding source amongst other details. Metadata forms will be utilized to create and organize the project database for local use, and upon publication for increased ease of data dissemination (see “Publication and Presentation” section below). Analyses will be scripted to facilitate reproducible science. Metadata associated with this proposed research, including information on sites, experiments, and data collected (e.g., date, time, location, experimental treatments and maintenance, and environmental variables measured) will be documented for all data following BCO-DMO recommendations. While there is not a single common standard for short read sequence data, essential information including a description of the sample, library, sequencing method will be included in the SRA repository. Data tags will allow the data to be easily retrievable at NCBI.
Policies for Data sharing and Public Access: Policies for access and sharing will include provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements. All of the raw data and processed data generated in this study will be made publicly available upon quality control. QC’d Data will be organized on a network attached storage (NAS) device with RAID redundancy open to the public. To make it easier for searching and discovery we also maintain a separate database including metadata and direct links to files. Raw data from secondary procedures including mapping and genome feature analysis will be available in real-time on via online lab notebooks. Data will be in non-proprietary formats such as tab-delimited text files. Limited analyzed data and workflows will also be made available via Galaxy and CyVerse as some analysis will take place on these platforms. Physical samples will be made available upon request where not consumed by analyses.
To ensure accuracy and data tracking, the project will have a specific data use policy including:
- User requests require current and valid contact information that will be used by the PI for tracking and documenting data usage.
- Users are required to cite the project publications and acknowledge the original funding source, NSF.
- Users have the final responsibility for any errors in their external and secondary analyses, while the PI and project participants will conduct quality control on the primary data and ensure the accuracy of the primary data to the best of their abilities.
- The PI and project participants will not release any private or confidential information to the public, and in-house databases will be password protected.
- The PI and project participants will retain intellectual property rights, except where explicitly released for publication and documentation.
Publication and Presentation: Our results will be disseminated in presentations at scientific meetings and peer-reviewed journal articles. All significant findings from the proposed research will be promptly prepared and submitted for publication with authorship that accurately reflects the contributions of those involved. Data and products from this project will be used in courses and course syllabi will be posted on the PIs websites as well as on the project E5 website. The URL for the project website, NCBI BioProject ID, and BCO-DMO doi for each sample / study will be provided in all publications generated by the proposed work.
Participant Roles: The PIs are responsible for supervising all data management in cooperation with the project participants. All participants are responsible for data collection, quality control, internal database management/curation, and data publication as applicable to their research responsibilities within the project. The graduate and undergraduate students will be trained in and involved in data collection and quality control across the process from collection to publication.