[ENH] Improve convergence between ALE null methods #411

tsalo · 2020-11-28T16:05:23Z

Closes #396.

Changes proposed in this pull request:

Increase resolution of ALE analytic null histogram from bins of 0.0001 to bins of 0.00001.
Round up maximum MA values based on target resolution in ALE histogram generation so that we don't need an arbitrary buffer.
- This involves including one extra bin in the histogram that has an impossible value (e.g., 1.000001). See [ENH] Improve convergence between ALE null methods #411 (comment) for rationale.
Simplify histogram generation in ALE.
- This involves cropping histogram bins when shifting from edges to centers.
~~Also increase resolution of SCALE analytic null histogram by same factor.~~ Reverted since it was too memory-intensive for the CI.

Apparently, the updated version is too memory-intensive for CI, which makes a sad sort of sense.

codecov · 2020-11-28T16:46:28Z

Codecov Report

Merging #411 (824007f) into master (8dc6c55) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #411   +/-   ##
=======================================
  Coverage   82.32%   82.32%           
=======================================
  Files          40       40           
  Lines        3848     3848           
=======================================
  Hits         3168     3168           
  Misses        680      680

Impacted Files	Coverage Δ
nimare/meta/cbma/ale.py	`96.27% <100.00%> (ø)`
nimare/meta/cbma/base.py	`94.31% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dc6c55...824007f. Read the comment docs.

By building our bins from the MA values after rounding up, we should no longer need a buffer.

tsalo · 2020-11-28T17:59:19Z

nimare/meta/cbma/ale.py

-
-        ma_hists = np.zeros((ma_values.shape[0], hist_bins.shape[0]))
+        # create bin centers, then shift them into bin edges
+        hist_bins = np.round(np.arange(0, max_poss_ale + (1.5 * step_size), step_size), 5) - (


The 1.5 should result in the maximum possible statistic value falling within the last bin, with no trailing bins after that.

tsalo · 2020-11-30T15:29:09Z

The only failing test is test_corr_transform_smoke[fwe_montecarlo-normal_data-ale+analytic-mkda_kernel]. I'm not sure what the problem is.

jdkent · 2020-11-30T19:06:28Z

the error is coming from this line

ss_idx = np.maximum(0, np.where(hist_weights <= p)[0][0] - 1)

the error occurs when p is less than any of the hist_weights. If p = 0.05 and hist_weights = array([1.0, 0.5, 0.2]), then np.where(hist_weights <= p) returns (array([], dtype=int64),). Since the array is empty, it does not have an index of [0].

This code fixes the immediate problem, I'll take a look at the code to see if there is something else strange going on:

ss_idx = np.maximum(
                0,
                np.where(
                    hist_weights <= p,
                    np.arange(hist_weights.size),
                    hist_weights.size,
                ).min() - 1,
            )

EDIT

I think the current behavior in master is strange and this pull request fixes it, currently when using ALE with the mkda kernel hist_weights contains weights for bins that do not contain 0 or 1. Since the mkda kernel can only give values of 0 or 1, hist_weights should only have weights for the bins containing 0 or 1.

The bin containing 1 has a relatively (compared to using an ALE kernel) high probability meaning if we apply a threshold of 0.05, hist_weights is unlikely to have any values less than 0.05.

Still have some thinking to do on how to handle this scenerio.

tsalo · 2020-12-02T05:17:04Z

Thank you for digging into this! It seems like the issue, then, is that the maximum possible value for an MKDA kernel + the ALE summary statistic formula (i.e., 1) is fairly highly probable, while any summary statistic value above that is impossible, and thus p-values associated with those higher values drop down to zero. By not including impossibly high values in the histogram, I was removing these p=0 locations.

One solution that seems to work is to simply include one extra bin when I create the histogram bins. Thus, the last bin's p-value will always be zero (since it exceeds the maximum possible summary statistic value), and when we grab the bin before the identified one, we will grab the maximum possible value's bin.

So for ALE + MKDA, we might have the following histogram bins and associated weights:

bins = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1]
null = [1.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.0]

If that extra bin (1.1) isn't included, then we have a minimum p-value of 0.3, which will trigger the bug, as you discovered. The extra bin gets us our "low" p-value, although we end up with an associated summary statistic of 1.0 instead of 1.1, based on how _p_to_summarystat is written.

tsalo · 2020-12-08T16:11:08Z

I think the extra bin solution works and makes sense, so I'm going to merge this.

tsalo added 4 commits November 28, 2020 11:03

Increase resolution of ALE analytic null histogram.

ae28628

Propagate change to SCALE as well.

259b14b

Revert SCALE change.

9775239

Apparently, the updated version is too memory-intensive for CI, which makes a sad sort of sense.

Bump up histogram buffer again...

fc75ea0

tsalo added 4 commits November 28, 2020 12:00

Simplify and improve null histograms.

136ca38

By building our bins from the MA values after rounding up, we should no longer need a buffer.

Run black.

39cfaa2

Test binning.

9480a35

Add tiny buffer to catch max value.

5ff4a40

tsalo commented Nov 28, 2020

View reviewed changes

Fix comment.

a6f2ecb

tsalo mentioned this pull request Nov 30, 2020

[ENH] Add analytic null method to KDA estimator #397

Merged

tsalo added 2 commits December 2, 2020 00:16

Crop histogram bins when shifting from edges to centers.

6d256b2

Include one extra (impossible) bin in histogram.

824007f

tsalo mentioned this pull request Dec 2, 2020

[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

Merged

tsalo merged commit cd7cb86 into neurostuff:master Dec 8, 2020

tsalo deleted the improve-ale-null-convergence branch December 8, 2020 16:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Improve convergence between ALE null methods #411

[ENH] Improve convergence between ALE null methods #411

tsalo commented Nov 28, 2020 •

edited

Loading

codecov bot commented Nov 28, 2020 •

edited

Loading

tsalo Nov 28, 2020

tsalo commented Nov 30, 2020

jdkent commented Nov 30, 2020 •

edited

Loading

tsalo commented Dec 2, 2020 •

edited

Loading

tsalo commented Dec 8, 2020

[ENH] Improve convergence between ALE null methods #411

[ENH] Improve convergence between ALE null methods #411

Conversation

tsalo commented Nov 28, 2020 • edited Loading

codecov bot commented Nov 28, 2020 • edited Loading

Codecov Report

tsalo Nov 28, 2020

Choose a reason for hiding this comment

tsalo commented Nov 30, 2020

jdkent commented Nov 30, 2020 • edited Loading

tsalo commented Dec 2, 2020 • edited Loading

tsalo commented Dec 8, 2020

tsalo commented Nov 28, 2020 •

edited

Loading

codecov bot commented Nov 28, 2020 •

edited

Loading

jdkent commented Nov 30, 2020 •

edited

Loading

tsalo commented Dec 2, 2020 •

edited

Loading