-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/monthly soiling metrics #193
Conversation
(adds pep8 compliance in a few spots also)
fyi for the test failures @mdeceglie pvlib/pvlib-python#1018 |
required to use the atol kwarg in pd.testing.assert_frame_equal()
@kanderso-nrel thanks! I saw your comment after I defaulted to "bump the pandas version". But let's consider whether to use a function like the one you suggest in pvlib/pvlib-python#1018 as part of the review here |
Co-authored-by: Kevin Anderson <[email protected]>
Co-authored-by: Kevin Anderson <[email protected]>
rates = [x for sublist in rates for x in sublist] | ||
|
||
if rates: | ||
monthly_rate_data.append(np.quantile(rates, [0.5, ci_quantiles[0], ci_quantiles[1]])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed silly to me to be doing monte carlo for something involving only the humble uniform distribution, but I spent a while trying to come up with a clever way of analytically finding quantiles of the union of N non-identical uniform distributions and couldn't come up with anything that can compete with this for speed. Not sure if that's a testament to numpy being fast or python being slow...
On another note, even aside from analytical solutions, I thought about getting away with smaller "sample sizes" by using np.linspace
to generate perfectly uniform "samples", but np.random.uniform
is already so fast that it doesn't really matter:
%time _ = np.random.uniform(0.5, 1.5, 100000)
Wall time: 3.42 ms
All this to say I guess this code is fine 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A sum of N uniform distributions is called the Irwin-Hall distribution F_N. It's quantile function (inverse of the CDF) is not expressible algebraically (it's the inverse of a summation that looks like the binomial formula). I'd say sampling is the practical choice for evaluating quantiles in the simple case where each component distribution is U[0, 1], let alone the complicated case with scaled, shifted and unequally weighted components.
Thanks for the lunchtime math distraction :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! I was looking at Irwin-Hall last night. I eventually decided that the situation here is actually the union of uniform distributions, not the sum. Maybe "mixture" is the correct term instead of "union". By the time I realized this, it was bedtime and I didn't look into it any further. In case you are interested in additional distraction, here is the formula for the PDF of the sum of non-identical uniform distributions for N=2 (surely not novel, I just did it as an exercise): https://gist.github.com/kanderso-nrel/acd2f742bba81fb75139d9176d0364d8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I interpreted the code as calculating the quantiles of a union of samples from the underlying distributions. In my mind, the CDF underlying the union of samples is the mixture distribution which would look like the binomial formula. In this approach the random variable X is not the sum of random variables from each component distribution. I could have misunderstood the code.
Co-authored-by: Kevin Anderson <[email protected]>
@@ -80,6 +80,7 @@ | |||
# relative to this directory. They are copied after the builtin static files, | |||
# so a file named "default.css" will overwrite the builtin "default.css". | |||
html_static_path = ['_static', '_images'] | |||
smartquotes = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, this is because parameter description list entries starting with a quoted string don't render the open-quote character correctly when smartquotes is enabled, see sphinx-doc/sphinx#4028
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
__init__.py