Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Budget optimizer refactor #1357

Merged
merged 2 commits into from
Jan 16, 2025

Conversation

ricardoV94
Copy link
Contributor

@ricardoV94 ricardoV94 commented Jan 8, 2025

This PR refactors the budget optimizer to extract the response graph from the MMM model directly. The users don't have to pass the arguments needed to rebuild the response function, which simplifies things quite a lot and makes the optimizer able to handle arbitrarily complex models [citation needed]

There are some lingering issues. When transformations are done out-of model (as are the channel variables), we need to find them from the model. These would ideally be part of the model, so the optimizer doesn't have to worry about this.

Breaking changes:

  1. Budget optimizer API totally changed. Only need to specify mmm_model, and num_periods.
  2. No more automatic summation of the response variable, so objective functions and constraints may need to be rewritten. Should be less messy after Define a total_channel_contributions deterministic in MMM models #1387
  3. In anticipation of models with more dimensions (such as hierarchical MMM), inputs and outputs to the optimizer are now xarray Datarray, instead of dictionaries as was the case before. For back-compat inputs can still be dictionaries, but outputs are not.

TODO:

  • Tests of the BudgetOptimizer will need to be updated to have a hmm_model
  • Update NBs (?)
  • Update docs

Closes #1331


📚 Documentation preview 📚: https://pymc-marketing--1357.org.readthedocs.build/en/1357/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added docs Improvements or additions to documentation CLV MMM tests mlflow labels Jan 8, 2025
@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from 0196b3a to 6c946f2 Compare January 8, 2025 18:33
@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch 2 times, most recently from 3a348d2 to 10b0783 Compare January 8, 2025 18:35
Copy link

codecov bot commented Jan 8, 2025

Codecov Report

Attention: Patch coverage is 86.17886% with 17 lines in your changes missing coverage. Please review.

Project coverage is 93.80%. Comparing base (32f44c7) to head (dc6a63c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pymc_marketing/mmm/budget_optimizer.py 90.47% 8 Missing ⚠️
pymc_marketing/mmm/mmm.py 66.66% 8 Missing ⚠️
pymc_marketing/mmm/utility.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1357      +/-   ##
==========================================
- Coverage   93.94%   93.80%   -0.15%     
==========================================
  Files          48       48              
  Lines        5137     5198      +61     
==========================================
+ Hits         4826     4876      +50     
- Misses        311      322      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@juanitorduz
Copy link
Collaborator

juanitorduz commented Jan 8, 2025

yay! 🚀 Let me know if you need any support!

@cetagostini cetagostini mentioned this pull request Jan 8, 2025
15 tasks
@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from 10b0783 to 36985c4 Compare January 9, 2025 19:28
@ColtAllen
Copy link
Collaborator

Why does this have a CLV label?

@wd60622
Copy link
Contributor

wd60622 commented Jan 12, 2025

Why does this have a CLV label?

Seems like from file change bot. Then the force push made any clv files have no changes. I removed the clv label

@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from 36985c4 to 7bed11a Compare January 15, 2025 14:18
@ricardoV94
Copy link
Contributor Author

Fixed the tests, NBs next

@ricardoV94
Copy link
Contributor Author

@carlosagostini do I need to worry about other functions in https://github.com/pymc-labs/pymc-marketing/blob/7bed11a61b2310835863a7d092b9b63a2ac29081/pymc_marketing/mmm/utility.py

Asking because we are no longer raveling samples / automatically summing budget dims inside the optimization code. Not sure if that could has any overlap with this change.

@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from 7bed11a to 0e42f11 Compare January 15, 2025 15:19
@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch 2 times, most recently from 742494e to 23e7ccb Compare January 15, 2025 15:43
@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Jan 15, 2025

The notebook job stops at the first failure. Is there a way to have it try all the notebooks like pytest does with tests?

I thought I was done when I fixed the last failure but now it went and failed in the next one.

@juanitorduz
Copy link
Collaborator

We will work on it 🙇!

In the meantime, i will review this one either tonight or tomorrow 🙏

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ricardoV94 ! This looks great! I left some small comments.

I also see the budget notebook is failing maybe due to an error not being caught by the tests?

There is another relevant notebook which uses the budget optimization: https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_case_study.html I can work ok it in a different PR :)

@juanitorduz
Copy link
Collaborator

Thanks for the changes @ricardoV94 !

I think the only things missing are the notebooks. Do you want to work on that or do you wanna support so that so can focus on doing your magic ;) ) ?

@ricardoV94
Copy link
Contributor Author

I'm on it! I will push soon (and revert the merge @cetagostini just did). I appreciate another deep review as I changed a couple things to make it more clean

@juanitorduz
Copy link
Collaborator

I'm on it! I will push soon (and revert the merge @cetagostini just did). I appreciate another deep review as I changed a couple things to make it more clean

Absolutely 💪 !

@ricardoV94
Copy link
Contributor Author

The logic used here to extract the response_distribution could be easily repurposed to simplify/speedup get_channel_contributions_forward_pass_grid.

I noticed this was rather slow while checking if the mmm case study notebook is now passing :D

@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from e57d20f to b4c5848 Compare January 16, 2025 13:56
@cetagostini
Copy link
Contributor

Hey the push was trying to rebase to my local, hope didn't took away anything important. I was working on solve those issues:

I came up with a quick helper function to create allocations and test. Now a few functions receive an array such as _create_synth_dataset but before they were using a dict, so, a function like this should be helpful.

def _create_allocation(value, **kwargs):
    """
    Create an xarray.DataArray with flexible dimensions and coordinates.

    Parameters:
    - value (array-like): The data values for the DataArray. Shape must match the dimensions implied by the kwargs.
    - **kwargs: Key-value pairs representing dimension names and their corresponding coordinates.

    Returns:
    - xarray.DataArray: The resulting DataArray with the specified dimensions and values.

    Raises:
    - ValueError: If the shape of `value` doesn't match the lengths of the specified coordinates.
    """
    # Extract the dimensions and coordinates
    dims = list(kwargs.keys())
    coords = {dim: kwargs[dim] for dim in dims}
    
    # Validate the shape of `value`
    expected_shape = tuple(len(coords[dim]) for dim in dims)
    if np.shape(value) != expected_shape:
        raise ValueError(f"The shape of 'value' {np.shape(value)} does not match the expected shape {expected_shape} based on the provided dimensions.")

    # Create the DataArray
    data_array = xr.DataArray(value, coords=coords, dims=dims)
    return data_array

They could do the following and use it, even with models with more dims.

# Example 1: Single dimension (channel)
da1 = create_allocation(value=[1.56753944, 1.43246056], channel=["x1", "x2"])

# Example 2: Two dimensions (channel and hierarchy)
da2 = create_allocation(
    value=[[1.5, 2.0], [2.5, 3.0]], 
    channel=["x1", "x2"], 
    hierarchy=["a", "b"]
)

How does that sounds for you?

@ricardoV94 ricardoV94 force-pushed the budget_optimizer_refactor branch from b4c5848 to 3a12939 Compare January 16, 2025 14:01
@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Jan 16, 2025

@cetagostini I did something simple for the dict->DataArray, for sample_response_distribution which is user facing, so it still accepts a dict:

        if isinstance(allocation_strategy, dict):
            # For backward compatibility
            allocation_strategy = DataArray(
                pd.Series(allocation_strategy), dims=("channel",)
            )

It seems like you are reinventing a way to define DataArray. Perhaps it's just better to push users to be comfortable with them, since those are the objects we work with outside of PyMC/PyTensor models?

@ricardoV94
Copy link
Contributor Author

@cetagostini I'm not against it though. That can possibly be done as a separate PR at this point, I guess?

@cetagostini
Copy link
Contributor

I think the approach you added works, and push users to get familiar. No against, I usually try to create helpers anyway, up to the users if want to get familiar and not use it, or use it. But I'm okay to address on maybe in other PR or #1538

@ricardoV94
Copy link
Contributor Author

Tests are passing!

@juanitorduz
Copy link
Collaborator

Wow! That was fast!!! Thanks!

@juanitorduz juanitorduz merged commit 7a41a72 into pymc-labs:main Jan 16, 2025
19 of 20 checks passed
@juanitorduz juanitorduz added the major API breaking changes label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation enhancement New feature or request major API breaking changes MMM optimizer priority: high tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Making optimizer pymc model agnostic
5 participants