Skip to content

Commit

Permalink
Making the headings consistent in'Your Problem' (#474)
Browse files Browse the repository at this point in the history
Co-authored-by: Bert Droesbeke <[email protected]>
  • Loading branch information
Martin Cook and bedroesb authored Mar 27, 2021
1 parent 836b618 commit 4c4a1b9
Show file tree
Hide file tree
Showing 11 changed files with 81 additions and 84 deletions.
4 changes: 2 additions & 2 deletions pages/your_problem/compliance_monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Markus Englund, Vera Ortseifen]
description: measure compliance to data management regulations and standards.
---

## How can data management capabilities be measured and documented?
## How can you measure and document data management capabilities?

### Description

Expand All @@ -32,7 +32,7 @@ By knowing their capabilities institutions can spot areas of improvement and dir
* Information Security, Data Protection, Accountability
* [21 CFR part 11](https://www.fda.gov/regulatory-information/search-fda-guidance-documents/part-11-electronic-records-electronic-signatures-scope-and-application) is a standard, which outlines criteria for electronic records in an IT system to be as valid as signed paper
records. It is widely adopted in lab information systems and applications used in clinical trials and medical research.
* [ISO 27001](https://www.iso.org/isoiec-27001-information-security.html) is an international standard for the management of information security. It is adopted by some universities and research institutes to certify their data centres.
* [ISO 27001](https://www.iso.org/isoiec-27001-information-security.html) is an international standard for the management of information security. It is adopted by some universities and research institutes to certify their data centres.
* [ISO/IEC 27018](http://data-reuse.eu/wp-content/uploads/2017/02/ISO-Standards.pdf) is a standard aimed to be a code of practice for protection of personally identifiable information (PII) in public clouds.

## How can I ethically access genetic resources of another country?
Expand Down
20 changes: 10 additions & 10 deletions pages/your_problem/data_analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ description: how to make data analysis FAIR.
---

## What are the best practices for data analysis?

### Description

When carrying out your analysis, you should also keep in mind that all your data analysis has to be reproducible. This will complement your research data management approach since your data will be FAIR compliant but also your tools and analysis environments. In other words, you should be able to tell what data and what code or tools were used to generate your results.

This will help to tackle reproducibility problems but also will improve the impact of your research through collaborations with scientists who will reproduce your in silico experiments.
This will help to tackle reproducibility problems but also will improve the impact of your research through collaborations with scientists who will reproduce your in silico experiments.

### Considerations

Expand All @@ -26,25 +26,25 @@ There are many ways that will bring reproducibility to your data analysis. You c
* Make your code available. If you have to develop some software for your data analysis, it is always a good idea to publish your code. The git versioning system offers both a way to release your code but offers also a versioning system. You can also use Git to interact with your software users. Be sure to specify a license for your code (see the [licensing section](../licensing.md)).
* Use package and environment management system. By using package and environment management systems like [Conda](https://anaconda.org/) and its bioinformatics specialized channel [Bioconda](https://bioconda.github.io/), researchers that have got access to your code will be able to easily install specific versions of tools, even older ones, in an isolated environment. They will be able to compile/run your code in an equivalent computational environment, including any dependencies such as the correct version of R or particular libraries and command-line tools your code use. You can also share and preserve your setup by specifying in a [environment file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) which tools you installed.
* Use container environments. As an alternative to package management systems you can consider _container environments_ like [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/docs/).
* Use workflow management systems. [Scientific Workflow management systems](https://en.wikipedia.org/wiki/Scientific_workflow_system) will help you organize and automate how computational tools are to be executed. Compared to composing tools using a standalone script, workflow systems also help document the different computational analyses applied to your data, and can help with scalability, such as cloud execution. There a many workflow management systems available. One can mention, such as [Galaxy](https://galaxyproject.org/), [Nextflow](https://www.nextflow.io/), [Snakemake](https://snakemake.readthedocs.io/) (see also the table below). Reproducibility is also enhanced by the use of workflows, as they typically have bindings for specifying software packages or containers for the tools you use from the workflow, allowing others to re-run your workflow without needing to pre-install every piece of software it needs. It is a flourishing field and [many other workflow management systems](https://s.apache.org/existing-workflow-systems) are available, some of which are general-purpose (e.g. any command line tool), while others are domain-specific and have tighter tool integration.
* Use workflow management systems. [Scientific Workflow management systems](https://en.wikipedia.org/wiki/Scientific_workflow_system) will help you organize and automate how computational tools are to be executed. Compared to composing tools using a standalone script, workflow systems also help document the different computational analyses applied to your data, and can help with scalability, such as cloud execution. There a many workflow management systems available. One can mention, such as [Galaxy](https://galaxyproject.org/), [Nextflow](https://www.nextflow.io/), [Snakemake](https://snakemake.readthedocs.io/) (see also the table below). Reproducibility is also enhanced by the use of workflows, as they typically have bindings for specifying software packages or containers for the tools you use from the workflow, allowing others to re-run your workflow without needing to pre-install every piece of software it needs. It is a flourishing field and [many other workflow management systems](https://s.apache.org/existing-workflow-systems) are available, some of which are general-purpose (e.g. any command line tool), while others are domain-specific and have tighter tool integration.
* Use notebooks. Using notebooks, you will be able to create reproducible documents mixing text and code; which can help explain your analysis choices; but also be used as an exploratory method to examine data in detail. Notebooks can be used in conjunction with the other solutions mentioned above, as typically the notebook can be converted to a script. Some of the most well-known notebooks systems are: [Jupyter](https://jupyter.org/), with built-in support for code in Python, R and Julia, and many other [kernels](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels); [RStudio](https://rstudio.com/products/rstudio/#rstudio-desktop) based on R. See the table below for additional tools.


## How can I use package and environment management systems?
## How can you use package and environment management systems?

### Description
By using package and environment management systems like [Conda](https://anaconda.org/) and its bioinformatics specialized channel [Bioconda](https://bioconda), you will be able to easily install specific versions of tools, even older ones, in an isolated environment. You can also share and preserve your setup by specifying in a [environment file](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) which tools you installed.

### Considerations
Conda works by making a nested folder containing the traditional UNIX directory structure `bin/` `lib/` but installed from Conda's repositories instead of from a Linux distribution.
Conda works by making a nested folder containing the traditional UNIX directory structure `bin/` `lib/` but installed from Conda's repositories instead of from a Linux distribution.
* As such Conda enables consistent installation of computational tools independent of your distribution or operating system version. Conda is available for Linux, macOS and Windows, giving consistent experience across operating systems (although not all software is available for all OSes).
* Package management systems work particularly well for installing free and Open Source software, but can also be useful for creating an isolated environment for installing commercial software packages; for instance if they requires an older Python version than you have pre-installed.
* Conda is one example of a generic package management, but individual programming languages typically have their environment management and package repositories.
* You may want to consider submitting a release of your own code, or at least the general bits of it, to the package repositories for your programming language.

### Solutions
* MacOS-specific package management systems: [Homebrew](https://brew.sh/), [Macports](https://www.macports.org/).
* Windows-specific package management systems: [Chocolatey](https://chocolatey.org/) and [Windows Package Manager](https://docs.microsoft.com/en-us/windows/package-manager/) `winget`.
* Windows-specific package management systems: [Chocolatey](https://chocolatey.org/) and [Windows Package Manager](https://docs.microsoft.com/en-us/windows/package-manager/) `winget`.
* Linux distributions also have their own package management systems (`rpm`/`yum`/`dnf`, `deb`/`apt`) that have a wide variety of tools available, but at the cost of less flexibility in terms of the tool versions, to ensure they exist co-installed.
* Language-specific virtual environments and repositories: [rvm](https://rvm.io/) and [RubyGems](https://rubygems.org/) for Ruby, [pip](https://docs.python.org/3/installing/index.html) and [venv](https://docs.python.org/3/tutorial/venv.html) for Python, [npm](https://www.npmjs.com/) for NodeJS/Javascript, [renv](https://rstudio.github.io/renv/) and [CRAN](https://cran.r-project.org/) for R, [Apache Maven](https://maven.apache.org/) or [Gradle](https://gradle.org/) for Java etc.
* Tips and tricks to navigate the landscape of software package management solutions:
Expand All @@ -53,14 +53,14 @@ Conda works by making a nested folder containing the traditional UNIX directory
* If you need a few open source libraries for my Python script, none which require complilation, make a `requirements.txt` and reference `pip` packages.


## How can I use container environments?
## How can you use container environments?

### Description
Container environments like [Docker](https://www.docker.com/) or [Singularity](https://sylabs.io/docs/) allow you to easily install specific versions of tools, even older ones, in an isolated environment.

### Considerations
In short containers works almost like a virtual machine (VMs), in that it re-creates a whole Linux distibution with separation of processes, files and network.
* Containers are more lightweight than VMs since they don't virtualize hardware. This allows a container to run with a fixed version of the distribution independent of the host, and have just the right, minimal dependencies installed.
* Containers are more lightweight than VMs since they don't virtualize hardware. This allows a container to run with a fixed version of the distribution independent of the host, and have just the right, minimal dependencies installed.
* The container isolation also adds a level of _isolation_, which although not as secure as VMs, can reduce the attack vectors. For instance if the database container was compromised by unwelcome visitors, they would not have access to modify the web server configuration, and the container would not be able to expose additional services to the Internet.
* A big advantage of containers is that there are large registries of community-provided container images.
* Note that modifying things inside a container is harder than in a usual machine, as changes from the image are lost when a container is recreated.
Expand All @@ -72,9 +72,9 @@ In short containers works almost like a virtual machine (VMs), in that it re-cre
* Large registries of community-provided container images are [Docker Hub](https://hub.docker.com/) and [RedHat Quay.io](https://quay.io/search). These are often ready-to-go, not requiring any additional configuration or installations, allowing your application to quickly have access to open source server solutions.
* [Biocontainers](https://biocontainers.pro/) have a large selection of bioinformatics tools.
* To customize a Docker image, it is possible to use techniques such as [volumes](https://docs.docker.com/storage/volumes/) to store data and [Dockerfile](https://docs.docker.com/engine/reference/builder/). This is useful for installing your own application inside a new container image, based on a suitable _base image_ where you can do your `apt install` and software setup in a reproducible fashion - and share your own application as an image on Docker Hub.
* Container linkage can be done by _container composition_ using tools like [Docker Compose](https://docs.docker.com/compose/).
* Container linkage can be done by _container composition_ using tools like [Docker Compose](https://docs.docker.com/compose/).
* More advanced container deployment solutions like [Kubernetes](https://kubernetes.io/) and [Computational Workflow Management systems](#workflows-for-reproducibility) can also manage cloud instances and handle analytical usage.
* Tips and tricks to navigate the landscape of container solutions:
* Tips and tricks to navigate the landscape of container solutions:
* If you just need to run a database server, describe how to run it as a Docker/Singularity container.
* If you need several servers running, connected together, set up containers in Docker Compose.
* If you need to install many things, some of which are not available as packages, make a new `Dockerfile` recipe to build container image.
Expand Down
14 changes: 7 additions & 7 deletions pages/your_problem/data_management_plan.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
title: Data management plan
keywords:
keywords:
contributors: [Flora D'Anna, Daniel Faria]
tags: [plan, researcher, data manager, policy officer]
description: how to write a Data Management Plan (DMP).
---


## What template should I use to draft my Data Management Plan (DMP)?
## What template should you use to draft your Data Management Plan (DMP)?

### Description

A number of DMP templates are currently available, originating from different funding agencies or institutions.
Expand All @@ -18,16 +18,16 @@ Moreover, there are ongoing efforts to develop templates for machine-actionable

* Each funding agency could require or recommend a specific DMP template.
* Your institution could require and recommend a DMP template.
* Template could be presented as list of questions in text format or in a machine-actionable format.
* Template could be presented as list of questions in text format or in a machine-actionable format.

### Solutions
* Consult the documentation of your funding agency or institution, or contact them to figure out if they require or recommend a DMP template.
* A core DMP template has been provided by [Science Europe](https://www.scienceeurope.org/our-priorities/research-data/research-data-management/).
* Consider adopting the [DMP Common Standard](https://www.rd-alliance.org/group/dmp-common-standards-wg/outcomes/rda-dmp-common-standard-machine-actionable-data-management) model from the Research Data Alliance if you want to produce a machine-actionable DMP template.


## What tool should I use to write my DMP?
## What tool should you use to write your DMP?

### Description
DMPs can be written offline by using the downloaded template in a text document format.
However, a number of web-based DMP tools are currently available that greatly facilitate the process, as they usually contain several DMP templates and provide guidance in interpreting and answering the questions.
Expand All @@ -50,7 +50,7 @@ However, a number of web-based DMP tools are currently available that greatly fa
* Additional tools for creating a DMP are listed in the table below.


## What should I write in a DMP?
## What should you write in a DMP?

### Description
A DMP should address a broad range of data management aspects, regardless of template. It is important to be aware of the current best practices in DMPs before starting one.
Expand Down
Loading

0 comments on commit 4c4a1b9

Please sign in to comment.