Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmeans_inducing_points is non-reproducibly random #4712

Closed
JohnGoertz opened this issue May 23, 2021 · 6 comments
Closed

kmeans_inducing_points is non-reproducibly random #4712

JohnGoertz opened this issue May 23, 2021 · 6 comments
Labels

Comments

@JohnGoertz
Copy link

Calling kmeans_inducing_points from gp.utils results in different values every time, with no way to specify a random seed.

It appears that function wraps Scipy's kmeans, which takes its random seed from np.random.seed(), so it should be a quick fix to add a random_seed kwarg that just gets passed to np.random.seed() before calling scipy. I'm not sure if that would have broader unintended consequences, such as if the user had subsequent calls to np.random. However, if the user is calling kmeans with a specified random seed, they likely want their whole workflow to be reproducibly random, so maybe it wouldn't be an issue? Additionally, new code should use the more recent numpy random Generator, which np.random.seed() shouldn't affect.

  • PyMC3 Version: 3.11.2
  • Theano Version: 1.0.5
  • Python Version: 3.8.5
@twiecki
Copy link
Member

twiecki commented May 23, 2021

CC @bwengals

@twiecki
Copy link
Member

twiecki commented May 23, 2021

Thanks @JohnGoertz, want to do a PR that fixes this?

@OriolAbril
Copy link
Member

so it should be a quick fix to add a random_seed kwarg that just gets passed to np.random.seed() before calling scipy. I'm not sure if that would have broader unintended consequences, such as if the user had subsequent calls to np.random

I think we should not go with this proposal. numpy.random.seed has a global effect so it will definitely affect any posterior calls to np.random.<distribution> and it should not be used anymore (even if using the old RandomState instead of the new Generator see its api docs and general recommendations for the random module). This may mean that the fix needs to happen upstream in scipy though.

@JohnGoertz
Copy link
Author

Oh, wouldn't you know it, a PR fixing this exact problem was merged into scipy just a few weeks ago: scipy/scipy#13972

Guess we just gotta wait for 1.7.0 to be released.

@OriolAbril
Copy link
Member

Amazing, then we only have to expose the seed argument and pass it to scipy 🎉

@bwengals
Copy link
Contributor

Fixed by #5055. One can now pass in arbitrary kwargs to kmeans_inducing_points that get passed on to the scipy routine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants