Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce throughput to nrel api by combining duplicate requests #118

Merged
merged 3 commits into from
Oct 16, 2020

Conversation

jenhagg
Copy link
Collaborator

@jenhagg jenhagg commented Oct 16, 2020

Purpose

Reduce requests to NREL by only making one call, and parsing the 2 csv files in the response from memory.

What it does

See above, plus updated the retry to handle non successful status codes instead of the error pandas.read_csv would throw when given a url. Regenerated the notebook and found a 2x speedup, which makes sense assuming the download time is the bottleneck. Interestingly, this doesn't improve when rate limiting is removed - my guess is that the loop this runs in takes long enough that we don't trigger http 429, but only see this now that we don't have 2 requests grouped together between each iteration.

Time to review

10 min

@jenhagg jenhagg requested review from ahurli and rouille October 16, 2020 01:20
@jenhagg jenhagg self-assigned this Oct 16, 2020
@jenhagg jenhagg added this to the spiders milestone Oct 16, 2020
info = _get_info(url)
tz, elevation = info["Local Time Zone"], info["Elevation"]
info = pd.read_csv(BytesIO(resp.content), nrows=1)
data_resource = pd.read_csv(BytesIO(resp.content), dtype=float, skiprows=2)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I haven't used BytesIO before, but this seems like a great solution to this problem.

@retry(interval=self.interval, allowed_exceptions=(TransientError))
def download(url):
resp = requests.get(url)
if resp.status_code != 200:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I was thinking we might want to make the TransientError specific to error code 429 (i.e. if resp.status_code == 429:) so that we don't necessarily retry if the user copied the wrong API key into the notebook and was getting 403s for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that makes more sense :)

@jenhagg jenhagg merged commit ce5535d into develop Oct 16, 2020
@jenhagg jenhagg deleted the jon/requests2 branch October 16, 2020 21:06
@ahurli ahurli mentioned this pull request Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants