Reduce throughput to nrel api by combining duplicate requests #118

jenhagg · 2020-10-16T01:20:20Z

Purpose

Reduce requests to NREL by only making one call, and parsing the 2 csv files in the response from memory.

What it does

See above, plus updated the retry to handle non successful status codes instead of the error pandas.read_csv would throw when given a url. Regenerated the notebook and found a 2x speedup, which makes sense assuming the download time is the bottleneck. Interestingly, this doesn't improve when rate limiting is removed - my guess is that the loop this runs in takes long enough that we don't trigger http 429, but only see this now that we don't have 2 requests grouped together between each iteration.

Time to review

10 min

ahurli · 2020-10-16T17:47:52Z

prereise/gather/solardata/nsrdb/nrel_api.py

-        info = _get_info(url)
-        tz, elevation = info["Local Time Zone"], info["Elevation"]
+        info = pd.read_csv(BytesIO(resp.content), nrows=1)
+        data_resource = pd.read_csv(BytesIO(resp.content), dtype=float, skiprows=2)


Nice! I haven't used BytesIO before, but this seems like a great solution to this problem.

ahurli · 2020-10-16T17:52:05Z

prereise/gather/solardata/nsrdb/nrel_api.py

+        @retry(interval=self.interval, allowed_exceptions=(TransientError))
+        def download(url):
+            resp = requests.get(url)
+            if resp.status_code != 200:


Awesome! I was thinking we might want to make the TransientError specific to error code 429 (i.e. if resp.status_code == 429:) so that we don't necessarily retry if the user copied the wrong API key into the notebook and was getting 403s for example.

Yeah I think that makes more sense :)

Jon Hagg added 2 commits October 15, 2020 15:43

refactor: combine requests and read csv from memory

14c61ae

fix: handle based on status code, rerun notebook

a92f890

jenhagg requested review from ahurli and rouille October 16, 2020 01:20

jenhagg self-assigned this Oct 16, 2020

jenhagg added this to the spiders milestone Oct 16, 2020

ahurli approved these changes Oct 16, 2020

View reviewed changes

chore: more precise retry logic, fix docstrings

4d059be

jenhagg merged commit ce5535d into develop Oct 16, 2020

jenhagg deleted the jon/requests2 branch October 16, 2020 21:06

ahurli mentioned this pull request Mar 16, 2021

Develop into Master #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce throughput to nrel api by combining duplicate requests #118

Reduce throughput to nrel api by combining duplicate requests #118

jenhagg commented Oct 16, 2020

ahurli Oct 16, 2020

ahurli Oct 16, 2020

jenhagg Oct 16, 2020

Reduce throughput to nrel api by combining duplicate requests #118

Reduce throughput to nrel api by combining duplicate requests #118

Conversation

jenhagg commented Oct 16, 2020

Purpose

What it does

Time to review

ahurli Oct 16, 2020

Choose a reason for hiding this comment

ahurli Oct 16, 2020

Choose a reason for hiding this comment

jenhagg Oct 16, 2020

Choose a reason for hiding this comment