Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Improve convergence between ALE null methods #411

Merged
merged 11 commits into from
Dec 8, 2020

Conversation

tsalo
Copy link
Member

@tsalo tsalo commented Nov 28, 2020

Closes #396.

Changes proposed in this pull request:

  • Increase resolution of ALE analytic null histogram from bins of 0.0001 to bins of 0.00001.
  • Round up maximum MA values based on target resolution in ALE histogram generation so that we don't need an arbitrary buffer.
  • Simplify histogram generation in ALE.
    • This involves cropping histogram bins when shifting from edges to centers.
  • Also increase resolution of SCALE analytic null histogram by same factor. Reverted since it was too memory-intensive for the CI.

@codecov
Copy link

codecov bot commented Nov 28, 2020

Codecov Report

Merging #411 (824007f) into master (8dc6c55) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #411   +/-   ##
=======================================
  Coverage   82.32%   82.32%           
=======================================
  Files          40       40           
  Lines        3848     3848           
=======================================
  Hits         3168     3168           
  Misses        680      680           
Impacted Files Coverage Δ
nimare/meta/cbma/ale.py 96.27% <100.00%> (ø)
nimare/meta/cbma/base.py 94.31% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8dc6c55...824007f. Read the comment docs.

By building our bins from the MA values after rounding up, we should no longer need a buffer.

ma_hists = np.zeros((ma_values.shape[0], hist_bins.shape[0]))
# create bin centers, then shift them into bin edges
hist_bins = np.round(np.arange(0, max_poss_ale + (1.5 * step_size), step_size), 5) - (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1.5 should result in the maximum possible statistic value falling within the last bin, with no trailing bins after that.

@tsalo
Copy link
Member Author

tsalo commented Nov 30, 2020

The only failing test is test_corr_transform_smoke[fwe_montecarlo-normal_data-ale+analytic-mkda_kernel]. I'm not sure what the problem is.

@jdkent
Copy link
Member

jdkent commented Nov 30, 2020

the error is coming from this line

ss_idx = np.maximum(0, np.where(hist_weights <= p)[0][0] - 1)

the error occurs when p is less than any of the hist_weights. If p = 0.05 and hist_weights = array([1.0, 0.5, 0.2]), then np.where(hist_weights <= p) returns (array([], dtype=int64),). Since the array is empty, it does not have an index of [0].

This code fixes the immediate problem, I'll take a look at the code to see if there is something else strange going on:

ss_idx = np.maximum(
                0,
                np.where(
                    hist_weights <= p,
                    np.arange(hist_weights.size),
                    hist_weights.size,
                ).min() - 1,
            )

EDIT

I think the current behavior in master is strange and this pull request fixes it, currently when using ALE with the mkda kernel hist_weights contains weights for bins that do not contain 0 or 1. Since the mkda kernel can only give values of 0 or 1, hist_weights should only have weights for the bins containing 0 or 1.

The bin containing 1 has a relatively (compared to using an ALE kernel) high probability meaning if we apply a threshold of 0.05, hist_weights is unlikely to have any values less than 0.05.

Still have some thinking to do on how to handle this scenerio.

@tsalo
Copy link
Member Author

tsalo commented Dec 2, 2020

Thank you for digging into this! It seems like the issue, then, is that the maximum possible value for an MKDA kernel + the ALE summary statistic formula (i.e., 1) is fairly highly probable, while any summary statistic value above that is impossible, and thus p-values associated with those higher values drop down to zero. By not including impossibly high values in the histogram, I was removing these p=0 locations.

One solution that seems to work is to simply include one extra bin when I create the histogram bins. Thus, the last bin's p-value will always be zero (since it exceeds the maximum possible summary statistic value), and when we grab the bin before the identified one, we will grab the maximum possible value's bin.

So for ALE + MKDA, we might have the following histogram bins and associated weights:

bins = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1]
null = [1.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.0]

If that extra bin (1.1) isn't included, then we have a minimum p-value of 0.3, which will trigger the bug, as you discovered. The extra bin gets us our "low" p-value, although we end up with an associated summary statistic of 1.0 instead of 1.1, based on how _p_to_summarystat is written.

@tsalo
Copy link
Member Author

tsalo commented Dec 8, 2020

I think the extra bin solution works and makes sense, so I'm going to merge this.

@tsalo tsalo merged commit cd7cb86 into neurostuff:master Dec 8, 2020
@tsalo tsalo deleted the improve-ale-null-convergence branch December 8, 2020 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve convergence between ALE analytic and empirical null methods
2 participants