From ec5c39edf0797523ac84d48e45184dfedf84e3d9 Mon Sep 17 00:00:00 2001 From: floradanna <51486716+floradanna@users.noreply.github.com> Date: Mon, 15 Mar 2021 18:17:24 +0100 Subject: [PATCH] Update analysing.md --- pages/rdm_cycle/analysing.md | 32 ++++++++++++++------------------ 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/pages/rdm_cycle/analysing.md b/pages/rdm_cycle/analysing.md index 5db5bc167..ddbc39058 100644 --- a/pages/rdm_cycle/analysing.md +++ b/pages/rdm_cycle/analysing.md @@ -1,35 +1,31 @@ --- title: Analysing keywords: [Data analysis, Computing, Collaboration] -contributors: [Rob Hooft, Olivier Collin, Munazah Andrabi] +contributors: [Rob Hooft, Olivier Collin, Munazah Andrabi, Flora D'Anna] --- ## What is data analysis? -Data analysis encompasses all the different data manipulation and transformations that will help scientists to discover information or generate new knowledge. -This is the step where the actual work on the data towards the goal of a research project takes place. -The steps of the workflow in the analysis phase of a project will often be repeated several times to explore the data as well as to optimize the process. -According to the different types of data (quantitative or qualitative) the methods will differ. - -Data analysis follows the (often automated, batch) processing in the Processing stage. +Data analysis consists in exploring the collected data to begin understanding the messages contained in a dataset and/or in applying mathematical formula (or models) to identify relationships between variables. +The steps of the workflow in the analysis phase will often be repeated several times to explore the data as well as to optimize the workflow itself. +According to the different types of data (quantitative or qualitative) the data analysis methods will differ. Data analysis follows the (often automated, batch) data processing stage. ## Why is data analysis important? -Since Data Analysis is the stage where new knowledge and information are generated, it can be considered as central in the research process. -With many disciplines becoming data-oriented, more and more data intensive projects will occur and will involve experts from many thematic fields. +Since data analysis is the stage where new knowledge and information are generated, it can be considered as central in the research process. Because of the relevance of the data analysis stage in research findings, it is essential that the analysis workflow applied to a dataset complies with the FAIR principles. Moreover, it is extremily important that the analysis workflow is reproducible by other researchers and scientists. +With many disciplines becoming data-oriented, more and more data intensive projects will occur and will require experts from many thematic fields. ## What should be considered for data analysis? +Because of the diversity of domains and techologies in Life Sciences, data can be either "small" or "big data". As a consequence, the methods and technical solutions used for data analysis might differ. The characteristics of "big data" are often summarized by a growing list of ["V's" properties: Volume, Velocity, Variety, Variability, Veracity, Visualization and Value](https://bigdatapath.wordpress.com/2019/11/13/understanding-the-7-vs-of-big-data/). -Because of their nature, data in Life Sciences are now considered as Big Data. These characteristics of Big Data, often summarized by a growing list of "V" properties (Volume, Velocity, Variety, Veracity, Value, etc.), impact strongly the methods and technical solutions used for Data Analysis. - -* At the storage level, including the transfer of the data from the data production facility to the computing facility for its analysis, it is worthwhile to compare the cost of the transfer of massive amounts of data compared to the transfer of virtual images of machines for the analysis. -* The variety of the data poses an integration challenge that can only adressed with the help of best practices that make data interoperable and reusable. -* The Data Analysis phase relies on the previous steps (collection, processing) that will lay the foundations for the generation of new knowledge by providing acurate and trustworthy data. -* For the analysis of data, you will first have to consider the computing environment and choose between several computing infrastructure types for e.g. cluster, cloud. You will also need to select the appropriate work environment according to your needs and expertise (command line, web portal). -* The location of your data is important because of the needed need of proximity with computing resources. This can impact data transfer across the different infrastructures. -* You will have to select the tools best suited for the analysis of your data. Resources such as [bio.tools](https://bio.tools) can be very helpful. +* The data analysis stage relies on the previous stages (collection, processing) that will lay the foundations for the generation of new knowledge by providing acurate and trustworthy data. +* The variety of the data poses an integration challenge that can only be adressed with the help of best practices that make data interoperable and reusable during the collection and/or process stage. +* The location of your data is important because of the need of proximity with computing resources. This can impact data transfer across the different infrastructures. It is worthwhile to compare the cost of the transfer of massive amounts of data compared to the transfer of virtual images of machines for the analysis. +* For the analysis of data, you will first have to consider the computing environment and choose between several computing infrastructure types, e.g. cluster, cloud. You will also need to select the appropriate work environment according to your needs and expertise (command line, web portal). +* You will have to select the tools best suited for the analysis of your data. * It is important to document the exact steps used for data analysis. This includes the version of the software used, as well as the parameters used, as well as the computing environment. Manual "manipulation" of the data may complicate this documentation process. -* In the case of collaborative data analysis, you will have to ensure access to the data and tools for all collaborators. This can be achieved by setting up virtual research environments (VRE). +* In the case of collaborative data analysis, you will have to ensure access to the data and tools for all collaborators. This can be achieved by setting up virtual research environments. +* Consider publishing your analysis workflow according to the FAIR principles as well as your datasets. ## Problems to be addressed at this stage