Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CustomDist and Simulator no longer require class_name when creating a dist #6668

Merged
merged 2 commits into from
Apr 15, 2023

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Apr 13, 2023

Apparently cloudpickle is happy with classes with the same name, so we don't need to request them from users...

import cloudpickle

class A:
    def __init__(self, a):
        self.a = a
        
a = A(5)

class A:
    def __init__(self, b):
        self.b = b
        
b = A(3)

a_, b_ = cloudpickle.loads(cloudpickle.dumps((a, b)))

print(a_.a, b_.b)  # 5, 3
print(type(a_) is type(b_))  # False
print(type(a))  # <class '__main__.A'>
print(type(b))  # <class '__main__.A'>
print(type(a_) is A)  # False
print(type(b_) is A)  # True

📚 Documentation preview 📚: https://pymc--6668.org.readthedocs.build/en/6668/

* Also remove the experimental warning when using CustomSymbolicDists
@ricardoV94 ricardoV94 requested a review from lucianopaz April 13, 2023 08:08
@ricardoV94 ricardoV94 changed the title Remove class name requirement CustomDist and Simulator no longer require class_name when creating a dist Apr 13, 2023
@ricardoV94 ricardoV94 changed the title CustomDist and Simulator no longer require class_name when creating a dist CustomDist and Simulator no longer require class_name when creating a dist Apr 13, 2023
@codecov
Copy link

codecov bot commented Apr 13, 2023

Codecov Report

Merging #6668 (5de5e57) into main (2a324bc) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #6668   +/-   ##
=======================================
  Coverage   91.97%   91.97%           
=======================================
  Files          94       94           
  Lines       15942    15944    +2     
=======================================
+ Hits        14662    14664    +2     
  Misses       1280     1280           
Impacted Files Coverage Δ
pymc/distributions/distribution.py 96.33% <100.00%> (+0.02%) ⬆️
pymc/distributions/simulator.py 87.23% <100.00%> (ø)

@lucianopaz
Copy link
Contributor

@ricardoV94, is this linked to an open issue? Or is the change just to simplify the API? Anyway, I will need to read up on what is happening under the hood to understand what the problematic API was, how this is an improvement, and why cloudpickle makes this work.

@ricardoV94
Copy link
Member Author

No related issue. The goal is to simplify the API.

I introduced the classnames because I thought they were necessary as we are creating new types dynamically.

But I realized it's not needed so this PR removes the restriction that a user has to pass class_name.

This reminds me I have to update the docstrings.

@ricardoV94
Copy link
Member Author

For more context,

I think I borrowed this constraint from Simulator design. We started implementing it prior to the switch to cloudpickle and I think we were having troubles implementing the dynamic class in a way that was serializable, and one of the things I tried (without remembering now if it even mattered in the end) was to request a unique class_name

@lucianopaz
Copy link
Contributor

So I've dug a bit into cloudpickle and how it manages to handle this case. They basically opt to pickle the objects by value, instead of by reference. That means that a.__class__ and b.__class__ values are pickled too. They do this in a very cool way that I haven't spent enough time to understand fully. They seem to leverage the information in the __code__ objects to serialize and rebuild everything:

>>> import dis
>>> dis.disco(cloudpickle.loads(cloudpickle.dumps(a)).__class__.__init__.__code__)
3           0 LOAD_FAST                1 (a)
            2 LOAD_FAST                0 (self)
            4 STORE_ATTR               0 (a)
            6 LOAD_CONST               0 (None)
            8 RETURN_VALUE
>>> dis.disco(cloudpickle.loads(cloudpickle.dumps(b)).__class__.__init__.__code__)
9           0 LOAD_FAST                1 (b)
            2 LOAD_FAST                0 (self)
            4 STORE_ATTR               0 (b)
            6 LOAD_CONST               0 (None)
            8 RETURN_VALUE

The class name is indeed the same for A and B, but the underlying class object isn't garbage collected until a is out of scope. That makes pickle raise an informative error when you try to pickle a:

>>> import pickle
>>> pickle.loads(pickle.dumps(a))
PicklingError                             Traceback (most recent call last)
Cell In[26], line 1
----> 1 pickle.loads(pickle.dumps(a))

PicklingError: Can't pickle <class '__main__.A'>: it's not the same object as __main__.A

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me

return _CustomSymbolicDist(
name,
*dist_params,
class_name=name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, was old behavior to use the RV name as the class name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were doing "CustomDist_{class_name}", so yes, if a user created one in a model context it was CustomDIst_mu` for example. Pretty meh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a concern in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concern, just checking my understanding

@ricardoV94 ricardoV94 merged commit 1ed4475 into pymc-devs:main Apr 15, 2023
@ricardoV94
Copy link
Member Author

Thanks for the review 🙏 @lucianopaz

@ricardoV94 ricardoV94 deleted the remove_class_name_requirement branch April 18, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants