[WIP] possible 10x speed improvment #231

lukasheinrich · 2018-09-05T17:35:14Z

Description

The hardest bottleneck is that compute the expected poisson rate for each sample in each channel separately (these can have many bins and this computation is vectorized, but especially in SUSY analyses often it's only 1-bin anyways so that doesn't help much)

The solution is to vectorize the computation across channels and samples. But the problem is that the number of samples in each channel is not the same (somewhat similar to @jpivarski's awkward-arrays).

But we can still make this computation vectorized by adding some padding and construcing a cube of shape (nchannels, nsamples, nbins)

where nsamples and nbins are the maximum values of samples and bins observed in the spec

Then the approach is to

create the cube and an index that for each modifier keeps track of which cells in the cube it affects (via multi-indices)
loop over all modifiers and apply it to just the cells indicated by the multiindex
each modifier creates a "factor field" of the same shape as the cube (for histosys we need to have a special case)
sum all samples (now the cube has shape (nchannels,nbins)
linearize the data via .ravel() now the expected_actualdata has (nchannels*nbins,)
remove padding fields (via a multindex as well)

some simple benchmarking shows some promise

@kratsg @matthewfeickert this is not yet passing all tests, and step 4 is missing. right now I have only benchmarked this on a test case where no padding is needed

Tests are passing
"WIP" removed from the title of the pull request

lukasheinrich · 2018-09-05T17:39:37Z

@kratsg can you confirm that this is the scale of the MBJ workspace (30 channels, 7 samples per channel, 5 modifiers per sample, 1 bin per sample)

matthewfeickert · 2018-09-05T17:44:58Z

@lukasheinrich This is fantastic news! Looking forward to checking this out more tonight.

kratsg · 2018-09-05T17:48:52Z

I have some code changes locally to do something like this, but I never got it to work/pass tests -- the math confused me a little bit for combining everything. I'll need some time to look through this and see if I understand it.

jpivarski · 2018-09-05T17:51:23Z

Just to consider it as an option, you could use awkward-array itself.

awkward-array will always be minimal-dependency, as it is intended as a layer under packages like this, which can use it for basic problems like "sum within groups." While the base library only uses Numpy, optimizations for specialized hardware will be implemented as external add-ons. If pyhf depends on awkward-array, then it could benefit from the add-ons.

I just finished a study of the problem you describe: adding items in variable-sized groups (as a function of Poisson average group size). The vectorized algorithm Jaydeep developed this summer is a clear improvement over the sequential for loop you get from Numpy:

lukasheinrich · 2018-09-05T17:53:26Z

@jpivarski yes this was my first instinct. However, in pyhf we want to support multiple tensorbackends such as TensorFlow and PyTorch etc that all more or less implement the numpy tensor ops. This way, we could make easy use of hardware acceleration etc.

Would this also be a goal of awkward-array? or would you want to keep this numpy only?

lukasheinrich · 2018-09-05T17:57:22Z

this is the old way of computing this where we

first compute each sample in each channel (let most column)
sum up samples for each channel (2nd column)
concatenate the channels (3rd column)

jpivarski · 2018-09-05T17:59:29Z

@lukasheinrich The base awkward-array package will support any library that has a Numpy interface. All access to Numpy is passed through awkward.util.numpy, which will allow one library named "numpy" to be swapped out for another with the same functions. The main one I have in mind is CuPy. I know that PyTorch wraps arrays that can also be wrapped by numpy.frombuffer, but TensorFlow keeps its arrays hidden somehow.

As long as the library either implements the Numpy API or can be made to do so (e.g. with shims translating every TensorFlow function into its Numpy equivalent), then awkward-array supports that. This is an assumption that's broken by the optimizers (like in that last plot); it's why we always have to have a basic version that only asks for the arrays to quack like Numpy.

jpivarski · 2018-09-05T18:01:05Z

@lukasheinrich TensorFlow eager tensors have a .numpy() function to convert to Numpy. I don't know if that's a view or a copy.

lukasheinrich · 2018-09-05T18:01:09Z

@jpivarski yeah we are more or less trying to have these shims here

https://github.com/diana-hep/pyhf/tree/master/pyhf/tensor

and there is https://github.com/tensorly/tensorly which seems to try to do something similar

kratsg · 2018-09-05T18:02:37Z

@lukasheinrich TensorFlow eager tensors have a .numpy() function to convert to Numpy. I don't know if that's a view or a copy.

Isn't it a numpy array under the hood - by view?

print(type(tf.Session().run(tf.constant([1,2,3]))))

jpivarski · 2018-09-05T18:05:51Z

If so, that would be good. I think that ML frameworks like PyTorch and TensorFlow implement their own array classes so that they can freely move them from CPU to GPU and/or ignore the distinction between eager and lazy evaluation. Surely they all have methods to move them to the CPU and eagerly evaluate, the corner of these four options where Numpy lives.

kratsg · 2018-09-05T18:13:35Z

Hrmm, this is generating a cube and vectorizing that portion of it -- but we'd still want to do a previous step before this of dealing with meta-modifiers first because that reduces the dimensionality of the cube we need at the end. No?

In most cases, the largest dimensionality is the number of modifiers (unless we're CMS).

lukasheinrich · 2018-09-05T18:17:40Z

@kratsg the meta modifiers touch a different portion of the code (the computation of the constraint term in the pdf) actually I started with this as well and it's still in this PR but commented out

https://github.com/diana-hep/pyhf/pull/231/files#diff-0e8e9106451dbaea56a5ff43a27335edR287

but I didn't really see any improvement

kratsg · 2018-09-05T18:23:51Z

but I didn't really see any improvement

Damn. Separate idea, I should update the _ModelConfig to provide a channels, samples, modifiers options which are just list of strings. Then you can generate an appropriate index by calling something like _ModelConfig.samples.index('ttbar') to get one dimension of that index efficiently(?).

See diana-hep/pyhf@5d7e4a8 as an example.

>>> spec = {  ... }
>>> model = pyhf.Model(spec)
>>> model.channels
['firstchannel']
>>> model.modifiers
['mu', 'stat_firstchannel']
>>> model.samples
['mu', 'bkg1', 'bkg2', 'bkg3']

pyhf/pdf.py

lukasheinrich · 2018-09-06T00:09:32Z

i'll rebase after #236

lukasheinrich · 2018-09-06T01:09:08Z

ok got this to pass the tests for the numpy backend.. for the ML backends the issue is that tensor assignment doesn't work, but at least for TF there is tf.assign -- need to check

also @kratsg I actually added the op_codes to the modifiers, I missed that in the review of #236

lukasheinrich · 2018-09-06T03:01:16Z

so good and bad news.. these are the profiles of the MBJ execution

prof2_fields.txt
prof2_master.txt

note that both are roughly the same toplevel line (which is the bad news)

most time is spent here in the interpolation (as we knew)

     1574    0.015    0.000    0.101    0.000 interpolate.py:19(_hfinterp_code1)

     1574    0.014    0.000    0.098    0.000 interpolate.py:19(_hfinterp_code1)

but the new cube version should allow us to more easily vectorize that computation

here are the 1574 interpolations

sum([len(v['indices']) for k,v in p.modindex.items() if 'normsys' in k])
>>> 1574

but the number of cubes that are actually computed is only 89 , so if we can vectorize the interpolation such that it interpolates multiple slices in the cube at once we can improve

lukasheinrich · 2018-09-06T03:01:54Z

i.e. instead of this python loop

https://github.com/diana-hep/pyhf/pull/231/files#diff-0e8e9106451dbaea56a5ff43a27335edR148

we would want something tensor-native

kratsg · 2018-09-07T21:15:54Z

Looking at this comment, https://github.com/diana-hep/pyhf/pull/231#issuecomment-418817390 -- I tried to reimplement the same thing (using the makespec definition in #219) and I do not see improvement in the loops

lukasheinrich · 2018-09-07T21:50:36Z

@kratsg did you push your code to a separate branch ? is there a diff between the first commit in this branch and yours? maybe we can figure out what the bottleneck is

kratsg · 2018-09-07T21:54:57Z

@kratsg did you push your code to a separate branch ? is there a diff between the first commit in this branch and yours? maybe we can figure out what the bottleneck is

I can push the notebook and the utility function in to this branch if you're ok with it. I already rebased your branch.

lukasheinrich · 2018-09-08T14:02:28Z

@kratsg do you still se the ~1k s on the mbj example after the rebase? I'm fine with working together on this branch, but we should make sure we don't regress

kratsg · 2018-09-08T14:55:10Z

@kratsg do you still se the ~1k s on the mbj example after the rebase? I'm fine with working together on this branch, but we should make sure we don't regress

Ok, I'll re-run it on the full MBJ with the default run.

kratsg · 2018-09-09T17:22:44Z

New interpolation codes will be added in #251.

kratsg · 2018-09-28T14:36:56Z

Close in favor of #285.

matthewfeickert added the feat/enhancement New feature or request label Sep 5, 2018

matthewfeickert assigned lukasheinrich Sep 5, 2018

lukasheinrich closed this Sep 5, 2018

lukasheinrich reopened this Sep 5, 2018

kratsg mentioned this pull request Sep 5, 2018

Feature: maintain union of specification items as part of the model #232

Merged

2 tasks

kratsg reviewed Sep 5, 2018

View reviewed changes

pyhf/pdf.py Outdated Show resolved Hide resolved

kratsg reviewed Sep 5, 2018

View reviewed changes

pyhf/pdf.py Outdated Show resolved Hide resolved

kratsg reviewed Sep 5, 2018

View reviewed changes

pyhf/pdf.py Outdated Show resolved Hide resolved

kratsg reviewed Sep 5, 2018

View reviewed changes

pyhf/pdf.py Outdated Show resolved Hide resolved

kratsg mentioned this pull request Sep 5, 2018

Feature: add op_code to modifiers #236

Merged

2 tasks

lukasheinrich force-pushed the performance/factorfields branch from 9f0e61c to edf0cc9 Compare September 6, 2018 00:30

lukasheinrich mentioned this pull request Sep 6, 2018

set up validation and benchmarking testbench #219

Open

lukasheinrich added 9 commits September 7, 2018 13:54

..

705888f

..

5308f10

rebase

4fd8895

rebase

9725c7d

rebase

2c094f1

passes test_pdf for numpy

c452c8d

..

8fb004f

1k seconds MBJ

6bd1f88

..

1aec654

kratsg force-pushed the performance/factorfields branch from 311d664 to 1aec654 Compare September 7, 2018 20:56

lukasheinrich mentioned this pull request Sep 8, 2018

add interp code for modifiers (generic labels infrastructure?) #246

Closed

kratsg added 2 commits September 8, 2018 07:55

add a makespec to pyhf.utils

f73b442

add BenchmarkingMBJ notebook

bec40a2

matthewfeickert mentioned this pull request Sep 9, 2018

Investigate performance improvement of BLAS & LAPACK with MKL #249

Open

kratsg mentioned this pull request Sep 9, 2018

Feature: tensorize the interpolation codes (more general) #251

Merged

6 tasks

kratsg mentioned this pull request Sep 10, 2018

Pre-calculate constant terms in PDFs? #253

Closed

lukasheinrich mentioned this pull request Sep 14, 2018

Fix concatenate and add missing functionality to backends: #262

Merged

2 tasks

This was referenced Sep 17, 2018

precomputes all modifications at once #269

Merged

introduce cube #281

Closed

kratsg closed this Sep 28, 2018

kratsg deleted the performance/factorfields branch October 7, 2018 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] possible 10x speed improvment #231

[WIP] possible 10x speed improvment #231

lukasheinrich commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 5, 2018 •

edited

Loading

matthewfeickert commented Sep 5, 2018

kratsg commented Sep 5, 2018

jpivarski commented Sep 5, 2018

lukasheinrich commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 5, 2018

jpivarski commented Sep 5, 2018

jpivarski commented Sep 5, 2018

lukasheinrich commented Sep 5, 2018

kratsg commented Sep 5, 2018

jpivarski commented Sep 5, 2018

kratsg commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 5, 2018

kratsg commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

kratsg commented Sep 7, 2018

lukasheinrich commented Sep 7, 2018

kratsg commented Sep 7, 2018

lukasheinrich commented Sep 8, 2018

kratsg commented Sep 8, 2018

kratsg commented Sep 9, 2018

kratsg commented Sep 28, 2018

[WIP] possible 10x speed improvment #231

[WIP] possible 10x speed improvment #231

Conversation

lukasheinrich commented Sep 5, 2018 • edited Loading

Description

lukasheinrich commented Sep 5, 2018 • edited Loading

matthewfeickert commented Sep 5, 2018

kratsg commented Sep 5, 2018

jpivarski commented Sep 5, 2018

lukasheinrich commented Sep 5, 2018 • edited Loading

lukasheinrich commented Sep 5, 2018

jpivarski commented Sep 5, 2018

jpivarski commented Sep 5, 2018

lukasheinrich commented Sep 5, 2018

kratsg commented Sep 5, 2018

jpivarski commented Sep 5, 2018

kratsg commented Sep 5, 2018 • edited Loading

lukasheinrich commented Sep 5, 2018

kratsg commented Sep 5, 2018 • edited Loading

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

lukasheinrich commented Sep 6, 2018

kratsg commented Sep 7, 2018

lukasheinrich commented Sep 7, 2018

kratsg commented Sep 7, 2018

lukasheinrich commented Sep 8, 2018

kratsg commented Sep 8, 2018

kratsg commented Sep 9, 2018

kratsg commented Sep 28, 2018

lukasheinrich commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 5, 2018 •

edited

Loading

lukasheinrich commented Sep 5, 2018 •

edited

Loading

kratsg commented Sep 5, 2018 •

edited

Loading

kratsg commented Sep 5, 2018 •

edited

Loading