-
-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
add test for NowcastingDataModule adjust for both 5 mins, and 30 mins data
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
add pv_datetime_index to example data set
Issue/106 refactor dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work!
I've only got as far as gsp_data_source.py
(I haven't started on gsp_data_source.py
) but I've gotta down-tools for the day. I'll pick this back up on Monday! Looking great, though!
# Conflicts: # tests/test_utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think it looks good! Just a few very minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! I think I'm done!
This is looking awesome - I especially love all the tests, and the great comments! The code is very easy to read - thank you!
To give a bit of background (which I think you already know!)...
Sorry, I should've talked more about this earlier!
Annoyingly, the ESO metadata is a little ambiguous, and fails to tease apart two separate (but related) concepts:
- The grid supply point itself,
- and the region served by that grid supply point.
(To give even more background: The grid supply point is a huge substation with one or more "supergrid transformers". The supergrid transformers define the boundary between the transmission grid and the distribution grid.)
Each grid supply point serves a region of the country. Usually, a single GSP will serve a single contiguous geographical region. But that isn't always true! Sometimes a single GSP will serve a handful of non-overlapping but near-by regions. And - rarely - the location of the GSP will be outside the region it serves!
In the context of PV nowcasting, we don't really care where the actual grid supply point is. Instead we care about the region of the country served by each GSP (because that's where the solar PV is installed.)
Sheffield Solar's PV Live service attempts to estimate the total solar PV generation for each GSP region, using real-time data from a few thousand PV systems (obtained from PassivSystems).
When we select a square of satellite imagery centred over a GSP region, I'd recommend that the centre of the square of satellite imagery should be aligned with the centroid of the main GSP region of interest. By "centroid" I mean the "centre of mass" or the "geometric centre" :) Unfortunately, I don't think the ESO metadata includes the centroid of each GSP region: gsp_lat
and gsp_lon
are the locations of the grid supply point that serves the "GSP region". So we'll need to calculate the centroid for each GSP region shape. I think geopandas
has a function to calculate the centroid of a region.
BTW, I haven't come across the term "GSP system" before :) To me, in the context of PV nowcasting, a "system" means a single PV system. I'd suggest replacing the text "GSP system" with just "GSP" :) I've suggested this change on most (but not all) of the places where "GSP system" appears in the code :)
Again, thank you again for all your work on this! Sorry this review is quite long! The TL,DR; is that it's looking great!
Wooo! It's so awesome to see this code merged into |
#88
add data source: gsp.
Note that the GSP data is 30 minutely. Other data is 5 minute. Wanted to make sure that the data we load in to ml models have both 30 minutely and 5 minutlt
Sorry this is quite a big one
roughly break down of file changes
notebooks - 14
nowcasting_dataset - 15
test/data - 60
tests - 6