-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out code for interacting with nrel api and add retry logic #114
Conversation
prereise/gather/request_util.py
Outdated
|
||
return wrapper | ||
|
||
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been reading https://realpython.com/primer-on-python-decorators/, it was useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! This is new to me too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, just picked up some new things from there. TIL functions can have attributes -
In [13]: def foo():
...: pass
...: def bar(f):
...: f.x += 1
...: foo.x = 0
In [14]: bar(foo)
In [15]: foo.x
Out[15]: 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything is an object!
def sleepless(monkeypatch): | ||
counter = SleepCounter() | ||
monkeypatch.setattr(time, "sleep", counter.sleep) | ||
monkeypatch.setattr(time, "time", counter.time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know the monkeypatch fixture. Super useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is very nice
@retry(interval=self.interval) | ||
def _get_info(url): | ||
return pd.read_csv(url, nrows=1) | ||
|
||
@retry(interval=self.interval) | ||
def _get_data(url): | ||
return pd.read_csv(url, dtype=float, skiprows=2) | ||
|
||
info = _get_info(url) | ||
tz, elevation = info["Local Time Zone"], info["Elevation"] | ||
|
||
data_resource = _get_data(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If rate limiting is an issue, does it make sense to combine _get_data
and _get_info
into one HTTP call and then parse the result into the data and info? I'm assuming each pd.read_csv
is a HTTP call without any caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question - I tried doing this initially but realized the data we get back has 2 different csv files basically stacked, so it can't be parsed directly. We'd probably have to use an http library to get the raw content in one call then handle separating it; wasn't sure if it was worth it at the moment. Something else I just noticed - we have different RateLimit instances in each decorated function, which seems to work but since it's the same url it'd be nice to also share the instance. I'll look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I can help! I've spent a lot of time making my own http calls to various APIs (though I usually used requests
instead of the built-in urllib
in python3).
Going that route will probably make it a little easier to catch error code 429
as well, so we can have retry
only allow a custom exception like HTTPError429
and not retry when we see something like a 403
when an incorrect API key is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you guys can work on this in a follow up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let's meet up at some point and figure out the design, I think there are some cool options we could play around with.
prereise/gather/request_util.py
Outdated
|
||
return wrapper | ||
|
||
return decorator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! This is new to me too.
Purpose
Create a semi reusable module for downloading the solar data -
nrel_api.py
which is used as part of calculating power output insam.py
andnaive.py
. As part of doing this, we add retry and rate limiting to make downloads more reliable.What it does
Psm3Data
which acts a container for the responsesrequest_util.RateLimit
request_util.retry
Initially I had the nrel api client using rate limiting directly, but that didn't account for handling failures, which means when we call it in a loop and it fails, we have to start over. One way around this is combining the rate limit and retry - we retry up to a fixed number of failures at each iteration, but space them out using a reasonable rate limit (determined via experiment). This makes the loop very likely to finish, and enables tuning the rate, max retry count, etc, given a specific use case (api calls, or anything that can fail intermittently).
Testing
There are unit tests for the retry and rate limit. For the remaining changes, I mostly used the notebook to make sure things still look right.
Time to review
20-30 min