Budget optimizer refactor #1357

ricardoV94 · 2025-01-08T18:32:32Z

This PR refactors the budget optimizer to extract the response graph from the MMM model directly. The users don't have to pass the arguments needed to rebuild the response function, which simplifies things quite a lot and makes the optimizer able to handle arbitrarily complex models [citation needed]

There are some lingering issues. When transformations are done out-of model (as are the channel variables), we need to find them from the model. These would ideally be part of the model, so the optimizer doesn't have to worry about this.

Breaking changes:

Budget optimizer API totally changed. Only need to specify mmm_model, and num_periods.
No more automatic summation of the response variable, so objective functions and constraints may need to be rewritten. Should be less messy after Define a total_channel_contributions deterministic in MMM models #1387
In anticipation of models with more dimensions (such as hierarchical MMM), inputs and outputs to the optimizer are now xarray Datarray, instead of dictionaries as was the case before. For back-compat inputs can still be dictionaries, but outputs are not.

TODO:

Tests of the BudgetOptimizer will need to be updated to have a hmm_model
Update NBs (?)
Update docs

Closes #1331

📚 Documentation preview 📚: https://pymc-marketing--1357.org.readthedocs.build/en/1357/

review-notebook-app · 2025-01-08T18:32:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2025-01-08T18:59:07Z

Codecov Report

Attention: Patch coverage is 86.17886% with 17 lines in your changes missing coverage. Please review.

Project coverage is 93.80%. Comparing base (32f44c7) to head (dc6a63c).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
pymc_marketing/mmm/budget_optimizer.py	90.47%	8 Missing ⚠️
pymc_marketing/mmm/mmm.py	66.66%	8 Missing ⚠️
pymc_marketing/mmm/utility.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1357      +/-   ##
==========================================
- Coverage   93.94%   93.80%   -0.15%     
==========================================
  Files          48       48              
  Lines        5137     5198      +61     
==========================================
+ Hits         4826     4876      +50     
- Misses        311      322      +11

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

juanitorduz · 2025-01-08T19:01:08Z

yay! 🚀 Let me know if you need any support!

ColtAllen · 2025-01-12T14:39:15Z

Why does this have a CLV label?

wd60622 · 2025-01-12T15:19:56Z

Why does this have a CLV label?

Seems like from file change bot. Then the force push made any clv files have no changes. I removed the clv label

ricardoV94 · 2025-01-15T14:19:05Z

Fixed the tests, NBs next

ricardoV94 · 2025-01-15T15:19:07Z

@carlosagostini do I need to worry about other functions in https://github.com/pymc-labs/pymc-marketing/blob/7bed11a61b2310835863a7d092b9b63a2ac29081/pymc_marketing/mmm/utility.py

Asking because we are no longer raveling samples / automatically summing budget dims inside the optimization code. Not sure if that could has any overlap with this change.

pymc_marketing/mmm/budget_optimizer.py

tests/mmm/test_budget_optimizer.py

ricardoV94 · 2025-01-15T16:58:20Z

The notebook job stops at the first failure. Is there a way to have it try all the notebooks like pytest does with tests?

I thought I was done when I fixed the last failure but now it went and failed in the next one.

juanitorduz · 2025-01-15T19:05:28Z

We will work on it 🙇!

In the meantime, i will review this one either tonight or tomorrow 🙏

pymc_marketing/mmm/budget_optimizer.py

juanitorduz

Thanks @ricardoV94 ! This looks great! I left some small comments.

I also see the budget notebook is failing maybe due to an error not being caught by the tests?

There is another relevant notebook which uses the budget optimization: https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_case_study.html I can work ok it in a different PR :)

pymc_marketing/mmm/budget_optimizer.py

pymc_marketing/mmm/mmm.py

tests/mmm/test_budget_optimizer.py

juanitorduz · 2025-01-16T13:05:49Z

Thanks for the changes @ricardoV94 !

I think the only things missing are the notebooks. Do you want to work on that or do you wanna support so that so can focus on doing your magic ;) ) ?

ricardoV94 · 2025-01-16T13:07:24Z

I'm on it! I will push soon (and revert the merge @cetagostini just did). I appreciate another deep review as I changed a couple things to make it more clean

juanitorduz · 2025-01-16T13:28:43Z

I'm on it! I will push soon (and revert the merge @cetagostini just did). I appreciate another deep review as I changed a couple things to make it more clean

Absolutely 💪 !

ricardoV94 · 2025-01-16T13:42:37Z

The logic used here to extract the response_distribution could be easily repurposed to simplify/speedup get_channel_contributions_forward_pass_grid.

I noticed this was rather slow while checking if the mmm case study notebook is now passing :D

cetagostini · 2025-01-16T13:58:46Z

Hey the push was trying to rebase to my local, hope didn't took away anything important. I was working on solve those issues:

I came up with a quick helper function to create allocations and test. Now a few functions receive an array such as _create_synth_dataset but before they were using a dict, so, a function like this should be helpful.

def _create_allocation(value, **kwargs):
    """
    Create an xarray.DataArray with flexible dimensions and coordinates.

    Parameters:
    - value (array-like): The data values for the DataArray. Shape must match the dimensions implied by the kwargs.
    - **kwargs: Key-value pairs representing dimension names and their corresponding coordinates.

    Returns:
    - xarray.DataArray: The resulting DataArray with the specified dimensions and values.

    Raises:
    - ValueError: If the shape of `value` doesn't match the lengths of the specified coordinates.
    """
    # Extract the dimensions and coordinates
    dims = list(kwargs.keys())
    coords = {dim: kwargs[dim] for dim in dims}
    
    # Validate the shape of `value`
    expected_shape = tuple(len(coords[dim]) for dim in dims)
    if np.shape(value) != expected_shape:
        raise ValueError(f"The shape of 'value' {np.shape(value)} does not match the expected shape {expected_shape} based on the provided dimensions.")

    # Create the DataArray
    data_array = xr.DataArray(value, coords=coords, dims=dims)
    return data_array

They could do the following and use it, even with models with more dims.

# Example 1: Single dimension (channel)
da1 = create_allocation(value=[1.56753944, 1.43246056], channel=["x1", "x2"])

# Example 2: Two dimensions (channel and hierarchy)
da2 = create_allocation(
    value=[[1.5, 2.0], [2.5, 3.0]], 
    channel=["x1", "x2"], 
    hierarchy=["a", "b"]
)

How does that sounds for you?

ricardoV94 · 2025-01-16T14:04:23Z

@cetagostini I did something simple for the dict->DataArray, for sample_response_distribution which is user facing, so it still accepts a dict:

        if isinstance(allocation_strategy, dict):
            # For backward compatibility
            allocation_strategy = DataArray(
                pd.Series(allocation_strategy), dims=("channel",)
            )

It seems like you are reinventing a way to define DataArray. Perhaps it's just better to push users to be comfortable with them, since those are the objects we work with outside of PyMC/PyTensor models?

ricardoV94 · 2025-01-16T14:08:08Z

@cetagostini I'm not against it though. That can possibly be done as a separate PR at this point, I guess?

cetagostini · 2025-01-16T14:11:49Z

I think the approach you added works, and push users to get familiar. No against, I usually try to create helpers anyway, up to the users if want to get familiar and not use it, or use it. But I'm okay to address on maybe in other PR or #1538

ricardoV94 · 2025-01-16T14:40:55Z

Tests are passing!

juanitorduz · 2025-01-16T14:43:21Z

Wow! That was fast!!! Thanks!

github-actions bot added docs Improvements or additions to documentation CLV MMM tests mlflow labels Jan 8, 2025

ricardoV94 force-pushed the budget_optimizer_refactor branch from 0196b3a to 6c946f2 Compare January 8, 2025 18:33

github-actions bot added enhancement New feature or request optimizer priority: high labels Jan 8, 2025

ricardoV94 force-pushed the budget_optimizer_refactor branch 2 times, most recently from 3a348d2 to 10b0783 Compare January 8, 2025 18:35

cetagostini mentioned this pull request Jan 8, 2025

Custom optimizer constraints #1358

Merged

15 tasks

ricardoV94 force-pushed the budget_optimizer_refactor branch from 10b0783 to 36985c4 Compare January 9, 2025 19:28

wd60622 removed CLV mlflow labels Jan 12, 2025

ricardoV94 force-pushed the budget_optimizer_refactor branch from 36985c4 to 7bed11a Compare January 15, 2025 14:18

ricardoV94 force-pushed the budget_optimizer_refactor branch from 7bed11a to 0e42f11 Compare January 15, 2025 15:19