Merge pull request #133 from tsalo/doc-pipeline

[DOC] Improve documentation for pipeline
ME-ICA · Oct 31, 2018 · 42b5bad · 42b5bad
2 parents a7f468e + 2cc8680
commit 42b5bad
Show file tree

Hide file tree

Showing 34 changed files with 1,502 additions and 192 deletions.
diff --git a/Code_of_Conduct.md → CODE_OF_CONDUCT.md b/Code_of_Conduct.md → CODE_OF_CONDUCT.md
diff --git a/README.md b/README.md
@@ -1,6 +1,10 @@
-# tedana
+tedana: TE Dependent ANAlysis
+=============================
 
-`TE`-`de`pendent `ana`lysis (_tedana_) is a Python module for denoising multi-echo functional magnetic resonance imaging (fMRI) data.
+The ``tedana`` package is part of the ME-ICA pipeline, performing TE-dependent
+analysis of multi-echo functional magnetic resonance imaging (fMRI) data.
+``TE``-``de``pendent ``ana``lysis (``tedana``) is a Python module for denoising
+multi-echo functional magnetic resonance imaging (fMRI) data.
 
 [![Latest Version](https://img.shields.io/pypi/v/tedana.svg)](https://pypi.python.org/pypi/tedana/)
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/tedana.svg)](https://pypi.python.org/pypi/tedana/)
@@ -11,55 +15,47 @@
 [![Codecov](https://codecov.io/gh/me-ica/tedana/branch/master/graph/badge.svg)](https://codecov.io/gh/me-ica/tedana)
 [![Join the chat at https://gitter.im/ME-ICA/tedana](https://badges.gitter.im/ME-ICA/tedana.svg)](https://gitter.im/ME-ICA/tedana?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
-   ![](https://user-images.githubusercontent.com/7406227/40031156-57b7cbb8-57bc-11e8-8c51-5b29f2e86a48.png)
+About
+-----
 
+``tedana`` originally came about as a part of the [ME-ICA](https://github.com/me-ica/me-ica) pipeline.
+The ME-ICA pipeline originally performed both pre-processing and TE-dependent
+analysis of multi-echo fMRI data; however, ``tedana`` now assumes that you're
+working with data which has been previously preprocessed.
+If you're in need of a preprocessing pipeline, we recommend
+[fmriprep](https://github.com/poldracklab/fmriprep/), which has been tested
+for compatibility with multi-echo fMRI data and ``tedana``.
 
-## About
+![http://tedana.readthedocs.io/](https://user-images.githubusercontent.com/7406227/40031156-57b7cbb8-57bc-11e8-8c51-5b29f2e86a48.png)
 
-`tedana` originally came about as a part of the [`ME-ICA`](https://github.com/me-ica/me-ica) pipeline.
-The ME-ICA pipeline orignially performed both pre-processing and TE-dependent analysis of multi-echo fMRI data; however, `tedana` now assumes that you're working with data which has been previously preprocessed.
-If you're in need of a pre-processing pipeline, we recommend [`fmriprep`](https://github.com/poldracklab/fmriprep/) which has been tested for compatibility with multi-echo fMRI data and `tedana`.
+Installation
+------------
 
-### Why Multi-Echo?
+You'll need to set up a working development environment to use ``tedana``.
+To set up a local environment, you will need Python >=3.6 and the following
+packages will need to be installed:
 
-Multi-echo fMRI data is obtained by acquiring multiple TEs (commonly called [echo times](http://mriquestions.com/tr-and-te.html)) for each MRI volume during data collection.
-While fMRI signal contains important neural information (termed the blood oxygen-level dependent, or [BOLD signal](http://www.fil.ion.ucl.ac.uk/spm/course/slides10-zurich/Kerstin_BOLD.pdf)), it also contains "noise" (termed non-BOLD signal) caused by things like participant motion and changes in breathing.
-Because the BOLD signal is known to decay at a set rate, collecting multiple echos allows us to assess whether components of the fMRI signal are BOLD- or non-BOLD.
-For a comprehensive review, see [Kundu et al. (2017), _NeuroImage_](https://paperpile.com/shared/eH3PPu).
+- mdp
+- nilearn
+- nibabel>=2.1.0
+- numpy
+- scikit-learn
+- scipy
 
-In `tedana`, we take the time series from all the collected TEs, combine them, and decompose the resulting data into components that can be classified as BOLD or non-BOLD. This is performed in a series of steps including:
+You can then install ``tedana`` with:
 
-* Principal components analysis
-* Independent components analysis
-* Component classification
-
-More information and documentation can be found at https://tedana.readthedocs.io/.
-
-## Installation
-
-You'll need to set up a working development environment to use `tedana`.
-To set up a local environment, you will need Python >=3.6 and the following packages will need to be installed:
-
-mdp  
-nilearn  
-nibabel>=2.1.0  
-numpy  
-scikit-learn  
-scipy
-
-You can then install `tedana` with
-
-```
+```bash
 pip install tedana
 ```
 
-## Getting involved
+Getting involved
+----------------
 
-We :yellow_heart: new contributors !
+We :yellow_heart: new contributors!
 To get started, check out [our contributing guidelines](https://github.com/ME-ICA/tedana/blob/master/CONTRIBUTING.md).
 
-Want to learn more about our plans for developing `tedana` ?
-Have a question, comment, or suggestion ?
-Open or comment on one of [our issues](https://github.com/ME-ICA/tedana/issues) !
+Want to learn more about our plans for developing ``tedana``?
+Have a question, comment, or suggestion?
+Open or comment on one of [our issues](https://github.com/ME-ICA/tedana/issues)!
 
-We ask that all contributions to `tedana` respect our [code of conduct](https://github.com/ME-ICA/tedana/blob/master/Code_of_Conduct.md).
+We ask that all contributions to ``tedana`` respect our [code of conduct](https://github.com/ME-ICA/tedana/blob/master/CODE_OF_CONDUCT.md).
diff --git a/docs/_static/01_echo_timeseries.png b/docs/_static/01_echo_timeseries.png
diff --git a/docs/_static/02_echo_value_distributions.png b/docs/_static/02_echo_value_distributions.png
diff --git a/docs/_static/03_adaptive_mask.png b/docs/_static/03_adaptive_mask.png
diff --git a/docs/_static/04_echo_log_value_distributions.png b/docs/_static/04_echo_log_value_distributions.png
diff --git a/docs/_static/05_loglinear_regression.png b/docs/_static/05_loglinear_regression.png
diff --git a/docs/_static/06_monoexponential_decay_model.png b/docs/_static/06_monoexponential_decay_model.png
diff --git a/docs/_static/07_monoexponential_decay_model_with_t2.png b/docs/_static/07_monoexponential_decay_model_with_t2.png
diff --git a/docs/_static/08_optimal_combination_echo_weights.png b/docs/_static/08_optimal_combination_echo_weights.png
diff --git a/docs/_static/09_optimal_combination_value_distributions.png b/docs/_static/09_optimal_combination_value_distributions.png
diff --git a/docs/_static/10_optimal_combination_timeseries.png b/docs/_static/10_optimal_combination_timeseries.png
diff --git a/docs/_static/11_pca_component_timeseries.png b/docs/_static/11_pca_component_timeseries.png
diff --git a/docs/_static/12_pca_whitened_data.png b/docs/_static/12_pca_whitened_data.png
diff --git a/docs/_static/13_ica_component_timeseries.png b/docs/_static/13_ica_component_timeseries.png
diff --git a/docs/_static/14_te_dependence_models_component_0.png b/docs/_static/14_te_dependence_models_component_0.png
diff --git a/docs/_static/14_te_dependence_models_component_1.png b/docs/_static/14_te_dependence_models_component_1.png
diff --git a/docs/_static/14_te_dependence_models_component_2.png b/docs/_static/14_te_dependence_models_component_2.png
diff --git a/docs/_static/15_denoised_data_timeseries.png b/docs/_static/15_denoised_data_timeseries.png
diff --git a/docs/_static/16_t1c_denoised_data_timeseries.png b/docs/_static/16_t1c_denoised_data_timeseries.png
diff --git a/docs/_static/optimal_combination_workflow_plots.ipynb b/docs/_static/optimal_combination_workflow_plots.ipynb
diff --git a/docs/_static/tedana-poster.png b/docs/_static/tedana-poster.png
diff --git a/docs/_static/tedana-workflow.png b/docs/_static/tedana-workflow.png
diff --git a/docs/_static/tedana_workflow_plots.ipynb b/docs/_static/tedana_workflow_plots.ipynb
diff --git a/docs/approach.rst b/docs/approach.rst
@@ -1,32 +1,212 @@
-tedana's approach
-=================
+Processing pipeline details
+===========================
 
 ``tedana`` works by decomposing multi-echo BOLD data via PCA and ICA.
 These components are then analyzed to determine whether they are TE-dependent
 or -independent. TE-dependent components are classified as BOLD, while
 TE-independent components are classified as non-BOLD, and are discarded as part
 of data cleaning.
 
-Derivatives
------------
-
-* ``medn``
-    'Denoised' BOLD time series after: basic preprocessing,
-    T2* weighted averaging of echoes (i.e. 'optimal combination'),
-    ICA denoising.
-    Use this dataset for task analysis and resting state time series correlation
-    analysis.
-* ``tsoc``
-    'Raw' BOLD time series dataset after: basic preprocessing
-    and T2* weighted averaging of echoes (i.e. 'optimal combination').
-    'Standard' denoising or task analyses can be assessed on this dataset
-    (e.g. motion regression, physio correction, scrubbing, etc.)
-    for comparison to ME-ICA denoising.
-* ``*mefc``
-    Component maps (in units of \delta S) of accepted BOLD ICA components.
-    Use this dataset for ME-ICR seed-based connectivity analysis.
-* ``mefl``
-    Component maps (in units of \delta S) of ALL ICA components.
-* ``ctab``
-    Table of component Kappa, Rho, and variance explained values, plus listing
-    of component classifications.
+In ``tedana``, we take the time series from all the collected TEs, combine them,
+and decompose the resulting data into components that can be classified as BOLD
+or non-BOLD. This is performed in a series of steps, including:
+
+* Principal components analysis
+* Independent components analysis
+* Component classification
+
+.. image:: /_static/tedana-workflow.png
+  :align: center
+
+Multi-echo data
+```````````````
+
+Here are the echo-specific time series for a single voxel in an example
+resting-state scan with 5 echoes.
+
+.. image:: /_static/01_echo_timeseries.png
+  :align: center
+
+The values across volumes for this voxel scale with echo time in a predictable
+manner.
+
+.. image:: /_static/02_echo_value_distributions.png
+  :width: 400 px
+  :align: center
+
+Adaptive mask generation
+````````````````````````
+Longer echo times are more susceptible to signal dropout, which means that
+certain brain regions (e.g., orbitofrontal cortex, temporal poles) will only
+have good signal for some echoes. In order to avoid using bad signal from
+affected echoes in calculating :math:`T_{2}^*` and :math:`S_{0}` for a given voxel,
+``tedana`` generates an adaptive mask, where the value for each voxel is the
+number of echoes with "good" signal. When :math:`T_{2}^*` and :math:`S_{0}` are
+calculated below, each voxel's values are only calculated from the first :math:`n`
+echoes, where :math:`n` is the value for that voxel in the adaptive mask.
+
+.. image:: /_static/03_adaptive_mask.png
+  :width: 600 px
+  :align: center
+
+Monoexponential decay model fit
+```````````````````````````````
+The next step is to fit a monoexponential decay model to the data in order to
+estimate voxel-wise :math:`T_{2}^*` and :math:`S_0`.
+
+In order to make it easier to fit the decay model to the data, ``tedana``
+transforms the data. The BOLD data are transformed as :math:`log(|S|+1)`, where
+:math:`S` is the BOLD signal. The echo times are also multiplied by -1.
+
+.. image:: /_static/04_echo_log_value_distributions.png
+  :width: 400 px
+  :align: center
+
+A simple line can then be fit to the transformed data with linear regression.
+For the sake of this introduction, we can assume that the example voxel has
+good signal in all five echoes (i.e., the adaptive mask has a value of 5 at
+this voxel), so the line is fit to all available data.
+
+.. note::
+    ``tedana`` actually performs and uses two sets of :math:`T_{2}^*`/:math:`S_0` model fits.
+    In one case, ``tedana`` estimates :math:`T_{2}^*` and :math:`S_0` for voxels with good signal in at
+    least two echoes. The resulting "limited" :math:`T_{2}^*` and :math:`S_0` maps are used throughout
+    most of the pipeline. In the other case, ``tedana`` estimates :math:`T_{2}^*` and :math:`S_0` for voxels
+    with good data in only one echo as well, but uses the first two echoes for
+    those voxels. The resulting "full" :math:`T_{2}^*` and :math:`S_0` maps are used to generate the
+    optimally combined data.
+
+.. image:: /_static/05_loglinear_regression.png
+  :width: 400 px
+  :align: center
+
+The values of interest for the decay model, :math:`S_0` and :math:`T_{2}^*`,
+are then simple transformations of the line's intercept (:math:`B_{0}`) and
+slope (:math:`B_{1}`), respectively:
+
+.. math:: S_{0} = e^{B_{0}}
+
+.. math:: T_{2}^{*} = \frac{1}{B_{1}}
+
+The resulting values can be used to show the fitted monoexponential decay model
+on the original data.
+
+.. image:: /_static/06_monoexponential_decay_model.png
+  :width: 400 px
+  :align: center
+
+We can also see where :math:`T_{2}^*` lands on this curve.
+
+.. image:: /_static/07_monoexponential_decay_model_with_t2.png
+  :width: 400 px
+  :align: center
+
+Optimal combination
+```````````````````
+Using the :math:`T_{2}^*` estimates, ``tedana`` combines signal across echoes using a
+weighted average. The echoes are weighted according to the formula
+
+.. math:: w_{TE} = TE * e^{\frac{-TE}{T_{2}^*}}
+
+The weights are then normalized across echoes. For the example voxel, the
+resulting weights are:
+
+.. image:: /_static/08_optimal_combination_echo_weights.png
+  :width: 400 px
+  :align: center
+
+The distribution of values for the optimally combined data lands somewhere
+between the distributions for other echoes.
+
+.. image:: /_static/09_optimal_combination_value_distributions.png
+  :width: 400 px
+  :align: center
+
+The time series for the optimally combined data also looks like a combination
+of the other echoes (which it is).
+
+.. image:: /_static/10_optimal_combination_timeseries.png
+  :align: center
+
+TEDPCA
+``````
+The next step is to identify and temporarily remove Gaussian (thermal) noise
+with TE-dependent principal components analysis (PCA). TEDPCA applies PCA to
+the optimally combined data in order to decompose it into component maps and
+time series. Here we can see time series for some example components (we don't
+really care about the maps):
+
+.. image:: /_static/11_pca_component_timeseries.png
+
+These components are subjected to component selection, the
+specifics of which vary according to algorithm.
+
+In the simplest approach, ``tedana`` uses Minka’s MLE to estimate the
+dimensionality of the data, which disregards low-variance components.
+
+A more complicated approach involves applying a decision tree to identify and
+discard PCA components which, in addition to not explaining much variance,
+are also not significantly TE-dependent (i.e., have low Kappa) or
+TE-independent (i.e., have low Rho).
+
+After component selection is performed, the retained components and their
+associated betas are used to reconstruct the optimally combined data, resulting
+in a dimensionally reduced (i.e., whitened) version of the dataset.
+
+.. image:: /_static/12_pca_whitened_data.png
+
+TEDICA
+``````
+Next, ``tedana`` applies TE-dependent independent components analysis (ICA) in
+order to identify and remove TE-independent (i.e., non-BOLD noise) components.
+The dimensionally reduced optimally combined data are first subjected to ICA in
+order to fit a mixing matrix to the whitened data.
+
+.. image:: /_static/13_ica_component_timeseries.png
+
+Linear regression is used to fit the component time series to each voxel in each
+echo from the original, echo-specific data. This way, the thermal noise is
+retained in the data, but is ignored by the TEDICA process. This results in
+echo- and voxel-specific betas for each of the components.
+
+TE-dependence (:math:`R_2`) and TE-independence (:math:`S_0`) models can then
+be fit to these betas. These models allow calculation of F-statistics for the
+:math:`R_2` and :math:`S_0` models (referred to as :math:`\kappa` and
+:math:`\rho`, respectively).
+
+.. image:: /_static/14_te_dependence_models_component_0.png
+  :width: 400 px
+  :align: center
+
+.. image:: /_static/14_te_dependence_models_component_1.png
+  :width: 400 px
+  :align: center
+
+.. image:: /_static/14_te_dependence_models_component_2.png
+  :width: 400 px
+  :align: center
+
+A decision tree is applied to :math:`\kappa`, :math:`\rho`, and other metrics in order to
+classify ICA components as TE-dependent (BOLD signal), TE-independent
+(non-BOLD noise), or neither (to be ignored). The actual decision tree is
+dependent on the component selection algorithm employed. ``tedana`` includes
+two options: `kundu_v2_5` (which uses hardcoded thresholds applied to each of
+the metrics) and `kundu_v3_2` (which trains a classifier to select components).
+
+.. image:: /_static/15_denoised_data_timeseries.png
+
+Removal of spatially diffuse noise (optional)
+`````````````````````````````````````````````
+Due to the constraints of ICA, MEICA is able to identify and remove spatially
+localized noise components, but it cannot identify components that are spread
+out throughout the whole brain. See `Power et al. (2018)`_ for more information
+about this issue.
+One of several post-processing strategies may be applied to the ME-DN or ME-HK
+datasets in order to remove spatially diffuse (ostensibly respiration-related)
+noise. Methods which have been employed in the past include global signal
+regression (GSR), T1c-GSR, anatomical CompCor, Go Decomposition (GODEC), and
+robust PCA.
+
+.. image:: /_static/16_t1c_denoised_data_timeseries.png
+
+.. _Power et al. (2018): http://www.pnas.org/content/early/2018/02/07/1720985115.short
diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -7,7 +7,7 @@ For a more general guide to the tedana development, please see our
 `contributing guide`_. Please also follow our `code of conduct`_.
 
 .. _contributing guide: https://github.com/ME-ICA/tedana/blob/master/CONTRIBUTING.md
-.. _code of conduct: https://github.com/ME-ICA/tedana/blob/master/Code_of_Conduct.md
+.. _code of conduct: https://github.com/ME-ICA/tedana/blob/master/CODE_OF_CONDUCT.md
 
 
 Style Guide
@@ -44,7 +44,7 @@ This tells the development team that your pull request is a "work-in-progress",
 and that you plan to continue working on it.
 
 Release Checklist
-`````````````````
+-----------------
 
 This is the checklist of items that must be completed when cutting a new release of tedana.
 These steps can only be completed by a project maintainer, but they are a good resource for
@@ -55,7 +55,7 @@ releasing your own Python projects!
        `Release-drafter`_ should have already drafted release notes listing all
        changes since the last release; check to make sure these are correct.
     #. Pulling from the ``master`` branch, locally build a new copy of tedana and
-        `upload it to PyPi`_.
+       `upload it to PyPi`_.
 
 We have set up tedana so that releases automatically mint a new DOI with Zenodo;
 a guide for doing this integration is available `here`_.