Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out code for interacting with nrel api and add retry logic #114

Merged
merged 11 commits into from
Oct 8, 2020

Conversation

jenhagg
Copy link
Collaborator

@jenhagg jenhagg commented Oct 8, 2020

Purpose

Create a semi reusable module for downloading the solar data - nrel_api.py which is used as part of calculating power output in sam.py and naive.py. As part of doing this, we add retry and rate limiting to make downloads more reliable.

What it does

  • Move the code to construct the request and download data to the nrel_api module
  • Define a dataclass Psm3Data which acts a container for the responses
  • Add the ability to rate limit arbitrary functions - see request_util.RateLimit
  • Add the ability to retry failed requests - see request_util.retry

Initially I had the nrel api client using rate limiting directly, but that didn't account for handling failures, which means when we call it in a loop and it fails, we have to start over. One way around this is combining the rate limit and retry - we retry up to a fixed number of failures at each iteration, but space them out using a reasonable rate limit (determined via experiment). This makes the loop very likely to finish, and enables tuning the rate, max retry count, etc, given a specific use case (api calls, or anything that can fail intermittently).

Testing

There are unit tests for the retry and rate limit. For the remaining changes, I mostly used the notebook to make sure things still look right.

Time to review

20-30 min

@jenhagg jenhagg self-assigned this Oct 8, 2020
@jenhagg jenhagg linked an issue Oct 8, 2020 that may be closed by this pull request
@jenhagg jenhagg added this to the Welcome Drizzle milestone Oct 8, 2020
@jenhagg jenhagg requested review from rouille and ahurli October 8, 2020 01:42

return wrapper

return decorator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been reading https://realpython.com/primer-on-python-decorators/, it was useful!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is new to me too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just picked up some new things from there. TIL functions can have attributes -

In [13]: def foo():
    ...:     pass
    ...: def bar(f):
    ...:     f.x += 1
    ...: foo.x = 0

In [14]: bar(foo)

In [15]: foo.x
Out[15]: 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything is an object!

def sleepless(monkeypatch):
counter = SleepCounter()
monkeypatch.setattr(time, "sleep", counter.sleep)
monkeypatch.setattr(time, "time", counter.time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know the monkeypatch fixture. Super useful.

Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is very nice

Comment on lines +101 to +112
@retry(interval=self.interval)
def _get_info(url):
return pd.read_csv(url, nrows=1)

@retry(interval=self.interval)
def _get_data(url):
return pd.read_csv(url, dtype=float, skiprows=2)

info = _get_info(url)
tz, elevation = info["Local Time Zone"], info["Elevation"]

data_resource = _get_data(url)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If rate limiting is an issue, does it make sense to combine _get_data and _get_info into one HTTP call and then parse the result into the data and info? I'm assuming each pd.read_csv is a HTTP call without any caching.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question - I tried doing this initially but realized the data we get back has 2 different csv files basically stacked, so it can't be parsed directly. We'd probably have to use an http library to get the raw content in one call then handle separating it; wasn't sure if it was worth it at the moment. Something else I just noticed - we have different RateLimit instances in each decorated function, which seems to work but since it's the same url it'd be nice to also share the instance. I'll look into this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if I can help! I've spent a lot of time making my own http calls to various APIs (though I usually used requests instead of the built-in urllib in python3).

Going that route will probably make it a little easier to catch error code 429 as well, so we can have retry only allow a custom exception like HTTPError429 and not retry when we see something like a 403 when an incorrect API key is used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you guys can work on this in a follow up PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let's meet up at some point and figure out the design, I think there are some cool options we could play around with.


return wrapper

return decorator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is new to me too.

@jenhagg jenhagg merged commit ffa32d4 into develop Oct 8, 2020
@jenhagg jenhagg deleted the jon/nrel-api branch October 8, 2020 21:03
@ahurli ahurli mentioned this pull request Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Factor out NREL API logic
4 participants