Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beta geo expected probability no purchase #1094

Merged

Conversation

DylanZammit
Copy link
Contributor

@DylanZammit DylanZammit commented Oct 19, 2024

Description

The linked issue below describes the motivation and derivation of the newly introduced method. Only 2 files have been updated:

  • pymc_marketing/clv/models/beta_geo.py: Introduction of BetaGeoModel.expected_probability_no_purchase, which implements the derivation in aforementioned document.
  • docs/source/notebooks/bg_nbd.ipynb: Includes an example of how to use it with some example customers.

I might require guidance on how to implement unit tests for this new method. Other unit tests for the BetaGeo model seem to compare the output results of similar methods with the ones given by the lifetimes library. However, there is no corresponding functionality for the new proposed method.

Related Issue

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--1094.org.readthedocs.build/en/1094/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

review-notebook-app bot commented Oct 19, 2024

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2024-10-19T09:20:24Z
----------------------------------------------------------------

Can you please remove this from the notebook?


DylanZammit commented on 2024-10-19T09:25:43Z
----------------------------------------------------------------

Removed

@juanitorduz
Copy link
Collaborator

I might require guidance on how to implement unit tests for this new method.

Absolutely! Some ideas for this new method are:

  • Test that the method does not fail by simply calling it on a fitted model as in

    def test_expected_num_purchases(self):

  • Predict this on a user (or set of users) where this probability should be very high (close to one) and very low (close to zero)

Copy link
Contributor Author

Removed


View entire conversation on ReviewNB

@DylanZammit
Copy link
Contributor Author

I might require guidance on how to implement unit tests for this new method.

Absolutely! Some ideas for this new method are:

  • Test that the method does not fail by simply calling it on a fitted model as in
    def test_expected_num_purchases(self):
  • Predict this on a user (or set of users) where this probability should be very high (close to one) and very low (close to zero)

Thank you! I will try to work on this between today and tomorrow.

I have removed the redundant cell-block in the notebook and made reference to Equation 34 which I based the method off in the docstring of the new method. Appreciate the help and feedback, please let me know if there are any other suggested changes and I will review accordingly.

@DylanZammit
Copy link
Contributor Author

@juanitorduz @ColtAllen

Following unit tests added:

  • assertion on expected output layout
  • frequent customers with large t should have probability of no purchases close to 0
  • infrequent customers with small t should have probability of no purchases close to 1
  • If t = 0, then the probability of no purchases should be 1

All tests passed.

Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating this PR @DylanZammit. I noticed the expression in the research note includes the model likelihood function, but it seems here you've simplified things and removed terms that would otherwise cancel out. If results are valid this will be a more performant solution, well done.

Aside from renaming the function and a few minor details in the notebook and docstrings, I don't have any major remarks.

Copy link

review-notebook-app bot commented Oct 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-10-21T12:55:05Z
----------------------------------------------------------------

Suppress this warning so that it does not appear in the documentation


ColtAllen commented on 2024-10-24T10:45:40Z
----------------------------------------------------------------

Re-running the cell should clear it.

Copy link

review-notebook-app bot commented Oct 21, 2024

View / edit / reply to this conversation on ReviewNB

ColtAllen commented on 2024-10-21T12:55:05Z
----------------------------------------------------------------

See PR review notes about renaming this method. If you're curious how this method is demonstrated with ParetoNBDModel:

https://www.pymc-marketing.io/en/stable/notebooks/clv/pareto_nbd.html#probability-of-n-purchases-over-time-t


@wd60622 wd60622 added the enhancement New feature or request label Oct 21, 2024
@juanitorduz juanitorduz requested a review from ColtAllen October 22, 2024 07:55
@ColtAllen ColtAllen linked an issue Oct 22, 2024 that may be closed by this pull request
@juanitorduz juanitorduz added this to the 0.11.0 milestone Oct 22, 2024
@juanitorduz
Copy link
Collaborator

@DylanZammit I think this one is very close to the finish line. I think there are a couple of open comments from Colt, and then we are good to go. Please let us know if you need support :)

Copy link
Collaborator

Re-running the cell should clear it.


View entire conversation on ReviewNB

@ColtAllen
Copy link
Collaborator

@DylanZammit I think this one is very close to the finish line. I think there are a couple of open comments from Colt, and then we are good to go. Please let us know if you need support :)

I saw two typos in the notebook ("ot keep" and "deposits" instead of "spending"), but ReviewNB isn't allowing me to post any more comments for some reason. I'm also curious about that DIV/0 warning with the MAP fit, and noticed an older version of pytensor in the watermark.

@github-actions github-actions bot added docs Improvements or additions to documentation tests duplicate This issue or pull request already exists labels Oct 24, 2024
@juanitorduz juanitorduz requested a review from ColtAllen October 24, 2024 20:20
@DylanZammit
Copy link
Contributor Author

Made all suggested changes:

  • typos
  • pytensor version (reinstalled the conda environment using the same command conda env create -f environment.yml)
  • remove docstring reference of special case when y=0
  • re-ran the cell that gave a warning (which removes the warning).

I have still not investigated the DIV/0 error, however, I suspect it is due to numerical issues. Please let me know if you have any further suggestions.

@ColtAllen
Copy link
Collaborator

I have still not investigated the DIV/0 error, however, I suspect it is due to numerical issues. Please let me know if you have any further suggestions.

I'm not able to re-create the error on my 2023 M2 Macbook with identical pymc and pytensor versions; maybe it's a C++ compiler difference between your OS and mine. Anywho, if you add this cell to the top of the notebook and run it, then delete it before pushing, it'll hide all the warnings:

import warnings 
  
# Set warnings to be ignored 
warnings.filterwarnings('ignore')

@wd60622 wd60622 removed the duplicate This issue or pull request already exists label Oct 26, 2024
@github-actions github-actions bot added the duplicate This issue or pull request already exists label Oct 27, 2024
Copy link

codecov bot commented Oct 27, 2024

Codecov Report

Attention: Patch coverage is 96.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.61%. Comparing base (e8a4e3b) to head (e3751a7).
Report is 121 commits behind head on main.

Files with missing lines Patch % Lines
pymc_marketing/clv/models/beta_geo.py 96.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1094   +/-   ##
=======================================
  Coverage   95.61%   95.61%           
=======================================
  Files          39       39           
  Lines        4039     4063   +24     
=======================================
+ Hits         3862     3885   +23     
- Misses        177      178    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ColtAllen ColtAllen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work on the notebook and augmenting existing research with a more performant function derivation! If you ever want to link your PDF to the docstring references feel free to open another PR.

@ColtAllen ColtAllen merged commit 31def91 into pymc-labs:main Oct 27, 2024
14 checks passed
@wd60622 wd60622 removed the duplicate This issue or pull request already exists label Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLV docs Improvements or additions to documentation enhancement New feature or request tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include probability of 0 purchases in time range (T, T+t]
4 participants