diff --git a/docs/dataset-qc/Example1.Rmd b/docs/dataset-qc/Example1.Rmd index 76b1bd2..9c4a860 100644 --- a/docs/dataset-qc/Example1.Rmd +++ b/docs/dataset-qc/Example1.Rmd @@ -17,22 +17,24 @@ Repetitive, boring, and/or error-prone tasks should be scripted if possible; don Scripting can reduce how many errors occur and how quickly they're found (and corrected). For example, if a data file has a typo in the name a script will fail to find it while a person may click the file without noticing the typo. Scripts can also make it faster and easier to rerun an analysis or conversion if needed, such as to change a parameter. -When possible, have scripts read and write files directly from the intended permanent storage locations. For example, we store some files in [box](box.com). These files can be transferred by clicking options in the box web GUI, but this is slow it is easy to accidentally select the wrong file. Instead, we use [boxr](https://github.com/r-box/boxr) functions to retrieve and upload files from box. Box integration examples are not included here, but see [this DMCC example](https://github.com/ccplabwustl/dualmechanisms/blob/master/preparationsAndConversions/eprime/TEMPLATE_convertEprime.R). +When possible, have scripts read and write files directly from the intended permanent storage locations. For example, we store some files in [box](box.com). These files can be transferred by clicking options in the box web GUI, but this is slow it is easy to accidentally select the wrong file. Instead, we use [boxr](https://github.com/r-box/boxr) functions to retrieve and upload files from box. Box integration is not included here, but see [this DMCC example](https://github.com/ccplabwustl/dualmechanisms/blob/master/preparationsAndConversions/eprime/TEMPLATE_convertEprime.R). ## Tutorial: converting eprime files to csv. -### Background -- [Eprime](https://pstnet.com/products/e-prime/) saves its files in a proprietary format and non-human-readable plain text. We convert these to csv as quickly as possible after data collection. (Something I suggest doing for all non-standard file formats, not just eprime; store data in formats like nifti and text whenever possible for long-term accessibility.) -- This task is a prime target for scripting: the conversion must be done often and exactly, and accuracy can be tested algorithmically (e.g., by counting trial types). -- The tutorial eprime files are from a heartbeat-counting task like Pollatos, Traut-Mattausch, and Schandry ([2009](https://doi.org/10.1002/da.20504)). The task starts with a five-minute baseline period, followed by three trials during which the participant is asked to count their heart beats. After each trial participants verbally report how many beats they counted and their confidence in the count. The same trial order is used for all participants: the first is 25 seconds, second 35 seconds, and third 45 seconds. -- For **Dataset QC** we will verify that the three trials, initial baseline, and final rest periods are present, and in the expected order and durations. +[Eprime](https://pstnet.com/products/e-prime/) saves its files in a proprietary format and non-human-readable plain text. We convert these to csv as quickly as possible after data collection. (Something I suggest doing for all non-standard file formats, not just eprime; store data in formats like nifti and text whenever possible for long-term accessibility.) This task is a prime target for scripting: the conversion must be done often and exactly, and accuracy can be tested algorithmically (e.g., by counting trial types). + +The tutorial eprime files are from a heartbeat-counting task like Pollatos, Traut-Mattausch, and Schandry ([2009](https://doi.org/10.1002/da.20504)). The task starts with a five-minute baseline period, followed by three trials during which the participant is asked to count their heart beats. After each trial participants verbally report how many beats they counted and their confidence in the count. The same trial order is used for all participants: the first is 25 seconds, second 35 seconds, and third 45 seconds. + +In this tutorial we will convert the eprime text recovery files to csv. The code also checks that the three trials, initial baseline, and final rest periods are present, and in the expected order and durations. ```{r} # setwd("d:/maile/svnFiles/plein/conferences/ISMRM2022/onlineExample1"); # for Jo's local testing - test <- readLines("interoception_demoSub1.txt", warn=FALSE); + test <- readLines("example1files/interoception_demoSub1.txt", warn=FALSE); print(length(test)); # should be 315 + write.table("testing", "test.txt") + ``` This script uses the [eMergeR](https://github.com/AWKruijt/eMergeR) R library's functions for parsing information out of the eprime text recovery file. I generally suggest starting each script by loading any needed libraries, clearing R's memory, setting options, and defining needed variables. This first code block loads eMergeR, clears R's workspace, and sets the input and output paths. diff --git a/docs/dataset-qc/Example2.Rmd b/docs/dataset-qc/Example2.Rmd index f698bad..9a9f03a 100644 --- a/docs/dataset-qc/Example2.Rmd +++ b/docs/dataset-qc/Example2.Rmd @@ -12,15 +12,12 @@ jupyter: name: ir --- -# Example 2: Highlight important and diagnostic features as efficiently as possible -## QC reports won't be used if they are too long, ugly, or annoying. -- Aim for short, aesthetically pleasing files that emphasize easy-to-check diagnostic features (e.g., that the volumes "look like brains" and the surfaces have "tiger stripes"). -- It often works well to arrange images so can visually survey ("which one of these things is not like the others") and judge typical variability. -- Collect training examples of what the diagnostic features should (or not) look like. -- It is often more effective to investigate oddities with separate, more detailed files and reports when needed, rather than trying to fit all possibly-useful images and statistics into one document. - -# Tutorial: fMRI volume (nifti) image plotting -## Background +# Dataset QC Example 2 +Highlight important and diagnostic features as efficiently as possible. QC reports won't be used if they are too long, ugly, or annoying. Aim for short, aesthetically pleasing files that emphasize easy-to-check diagnostic features (e.g., that the volumes "look like brains" and the surfaces have "tiger stripes"). It is often more effective to investigate oddities with separate, more detailed files and reports when needed, rather than trying to fit all possibly-useful images and statistics into one document. + +It often works well to arrange images so can visually survey ("which one of these things is not like the others") and judge typical variability. Collect examples of what the diagnostic features should (or not) look like. + +## Tutorial: fMRI volume (nifti) image plotting Efficiently displaying QC images depends upon being able to easily access and plot the source files. The tutorial includes basic image reading and plotting, the key foundation skill upon which the more complex files (see links at the end of this page) are built. @@ -38,7 +35,8 @@ dim(img); # [1] 81 96 81 max(img); # [1] 1374.128 img[30,20,50]; # value of this voxel -layout(matrix(1:3, c(1,3))); # hopefully have three images in one row +options(repr.plot.width = 8, repr.plot.height = 3); # specify size in jupyter +layout(matrix(1:3, c(1,3))); # have three images in one row image(img[15,,], col=gray(0:64/64), xlab="", ylab="", axes=FALSE, useRaster=TRUE); # plot slice i=15 image(img[,20,], col=gray(0:64/64), xlab="", ylab="", axes=FALSE, useRaster=TRUE); # plot slice j=20 image(img[,,50], col=gray(0:64/64), xlab="", ylab="", axes=FALSE, useRaster=TRUE); # plot slice k=50 diff --git a/docs/dataset-qc/Example3.Rmd b/docs/dataset-qc/Example3.Rmd index 1f33a25..ac5bf0c 100644 --- a/docs/dataset-qc/Example3.Rmd +++ b/docs/dataset-qc/Example3.Rmd @@ -12,9 +12,9 @@ jupyter: name: ir --- +# Dataset QC Example 3 +Establish and continually perform control analyses. -# Example 3: Establish and continually perform control analyses -## Control analyses Control analyses are for dataset QC; they must be separate from the experimental questions and target analyses. **Positive control analyses** check for the existence of effects that **must** be present if the dataset is valid. If the effects are not detected, we know that something is wrong in the dataset or analysis, and work should not proceed until the issues are resolved. One of my favorite positive control analyses is **button pressing**: diff --git a/docs/dataset-qc/Example4.Rmd b/docs/dataset-qc/Example4.Rmd index 4bd772c..64b859e 100644 --- a/docs/dataset-qc/Example4.Rmd +++ b/docs/dataset-qc/Example4.Rmd @@ -12,7 +12,8 @@ jupyter: name: ir --- -# Example 4: Make use of automated dynamic reports +# Dataset QC Example 4 +Make use of automated dynamic reports ```{r}