[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

tsalo · 2020-11-26T21:57:28Z

Closes #347.

Changes proposed in this pull request:

Restructure Peaks2MapsKernel to follow the KernelTransformer convention.
Rename compute_ma to compute_ale_ma to match convention.
Rename peaks2maps to compute_p2m_ma to match convention.
Do not test Peaks2MapsKernel in test_estimator_performance.py (see Peaks2MapsKernel+ALE is too memory-intensive for tests #417).

tsalo · 2020-11-30T15:21:50Z

The jobs are being killed, which means that I need to take a look at them and see if I can reduce the computational load.

jdkent · 2020-12-01T22:16:50Z

Looks like _compute_null_analytic for the ALE estimator is a source of the increased memory consumption.
Specifically here:

NiMARE/nimare/meta/cbma/ale.py

Lines 147 to 151 in 5927613

    
           ale_idx = np.where(ale_hist > 0)[0] 
        
           exp_idx = np.where(exp_hist > 0)[0] 
        
           # Compute output MA values, ale_hist indices, and probabilities 
        
           ale_scores = 1 - np.outer(1 - hist_bins[exp_idx], 1 - hist_bins[ale_idx]).ravel()

ALE has 100s of non-zero histogram bins, whereas p2m has 1000s of non-zero bins, creating the variables ale_scores, score_idx, and probabilities with millions of entries.

A potential solution could be to save those variables with single precision (i.e., float32) as opposed to double precision (float64) if there are more than say 2000 exp_idx or ale_idx. This appears to be meaningful precision loss when computing score_idx using ale_scores as a float32 or float64 in my example:

ale_scores32 = 1 - np.outer(1 - hist_bins[exp_idx], 1 - hist_bins[ale_idx]).ravel().astype(np.float32)
ale_scores64 = ale_scores = 1 - np.outer(1 - hist_bins[exp_idx], 1 - hist_bins[ale_idx]).ravel()

ale_scores32.size # 38529432

score_idx32 = np.floor(ale_scores32 * inv_step_size).astype(int)
score_idx64 = np.floor(ale_scores64 * inv_step_size).astype(int)

(score_idx64 == score_idx32).sum() # 38506719
(score_idx64 != score_idx32).sum() # 22713

so this may not be a good solution.

Another potential solution is to turn off/skip tests for ale(analytic)+p2m_kernel since they take too much RAM.

tsalo · 2020-12-02T05:47:46Z

Our ale_scores precision should be determined by the precision of our histogram bins, I think. The smallest difference we should have to be able to track in ale_scores should correspond to our histogram bin resolution (in #411 it's 1e-5) squared (i.e., 1e-10). We should be able to determine the lowest-resolution float we can use based on our histogram bins using numpy.min_scalar_type, and then convert our values ahead of time!

I can't fully explain the differences, but if anything I think that float32 is more accurate. If we have a step_size of 1e-5, then np.float64(step_size ** 2) is 1.0000000000000002e-10, while np.float32(step_size ** 2) is 1e-10. Basically, I'm thinking the differences could come down to a Python precision issue that shows itself more in float64 than float32.

EDIT: Unfortunately, it looks like numpy's float classes seem to have precision issues that throw the results of numpy.min_scalar_type into question.

import numpy as np

np.min_scalar_type(0.9999)
# dtype('float16')

1 - 0.9999 == 0
# False (correct)

np.float16(1 - 0.9999) == 0
# False (correct)

0.9999 == 1
# False (correct)

np.float16(0.9999) == 1
# True (wrong)

Since numpy.float16 can detect 1 - 0.9999, but can't detect 0.9999, and numpy.min_scalar_type says that 0.9999 should work with numpy.float16, we can't use it.

tsalo · 2020-12-08T16:18:47Z

I'm just going to excise all precision-related changes and drop Peaks2MapsKernel from the performance tests.

We should open a new issue about the problem, though.

tsalo · 2020-12-10T16:06:25Z

The docs build is failing, but I don't think that's tied to changes in this PR (see #418), so I am going to merge if all other tests pass.

tsalo · 2020-12-10T16:16:46Z

nimare/extract/extract.py

@@ -384,6 +384,7 @@ def download_peaks2maps_model(data_dir=None, overwrite=False, verbose=1):
    url = "https://zenodo.org/record/1257721/files/ohbm2018_model.tar.xz?download=1"

    temp_dataset_name = "peaks2maps_model_ohbm2018__temp"
+    data_dir = _get_dataset_dir("", data_dir=data_dir, verbose=verbose)


This will probably help with #367 and #368..

tsalo added 7 commits November 26, 2020 14:46

Restructure Peaks2MapsKernel

7ea11d6

Rename compute_ma to compute_ale_ma.

0075925

A little cleanup.

f2b3aa7

Fix import.

434b019

Fix more imports.

bace30d

Try fixing the download function.

0b68c03

Update test_estimator_performance.py

c69d82e

Convert to minimum useable datatype.

86159d0

tsalo added 3 commits December 2, 2020 00:55

FIx!

74f98ef

Try adjusting float-type down more simply.

00838ce

Fix formatting.

aa6d0d3

tsalo added 2 commits December 10, 2020 10:52

Revert casting.

320d7ca

Remove Peaks2MapsKernel from performance test.

934f8d9

This was referenced Dec 10, 2020

Peaks2MapsKernel+ALE is too memory-intensive for tests #417

Closed

Example intermittently breaking docs build #418

Open

tsalo marked this pull request as ready for review December 10, 2020 16:14

tsalo commented Dec 10, 2020

View reviewed changes

tsalo merged commit 10ff10b into neurostuff:master Dec 10, 2020

tsalo deleted the fix-peaks2maps branch December 10, 2020 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

tsalo commented Nov 26, 2020 •

edited

Loading

tsalo commented Nov 30, 2020

jdkent commented Dec 1, 2020 •

edited

Loading

tsalo commented Dec 2, 2020 •

edited

Loading

tsalo commented Dec 8, 2020

tsalo commented Dec 10, 2020

tsalo Dec 10, 2020

[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

[FIX] Restructure Peaks2MapsKernel to operate like other kernels #410

Conversation

tsalo commented Nov 26, 2020 • edited Loading

tsalo commented Nov 30, 2020

jdkent commented Dec 1, 2020 • edited Loading

tsalo commented Dec 2, 2020 • edited Loading

tsalo commented Dec 8, 2020

tsalo commented Dec 10, 2020

tsalo Dec 10, 2020

Choose a reason for hiding this comment

tsalo commented Nov 26, 2020 •

edited

Loading

jdkent commented Dec 1, 2020 •

edited

Loading

tsalo commented Dec 2, 2020 •

edited

Loading