Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add top-level HIFLD grid orchestration function #236

Merged
merged 10 commits into from
Nov 30, 2021

Conversation

danielolsen
Copy link
Contributor

@danielolsen danielolsen commented Nov 1, 2021

Pull Request doc

Purpose

Add a single function that kicks off all processing of raw HIFLD data to outputs compatible with PowerSimData. Closes #226.

This is a draft PR for now at least until #235 is merged and #233 is addressed, intended to start conversation on feature design and identify lower-level issues to be addressed in separate PRs. Outstanding issues that are already on my radar to address as part of this PR (design/implementation suggestions welcome!):

  • Columns irrelevant to PowerSimData need to be dropped. EDIT: done
  • The bus.csv file needs an entry added in const.py for its columns with default values. EDIT: done.
  • We need to add missing column information to some data tables: EDIT: done.
    • branch.csv: r, b, ratio, and from_bus_id, to_bus_id for non-transformer branches
    • bus.csv: bus_id (I think we just need to rename the existing index), type, zone_id
    • bus2sub.csv: interconnect
    • dcline.csv: from_bus_id, to_bus_id
    • sub.csv: name (probably just a rename), interconnect_sub_id, lat, lon (renames)
  • generators need types, fuel prices, and heat-rate information. EDIT: done.

Ideally, the PowerSimData check_grid function (powersimdata.input.check_grid) will help us identify other similar issues, once Breakthrough-Energy/PowerSimData#551 is complete.

What the code is doing

There's new user-facing function create_csvs within the init.py file. It:

  • activates the highest-level functions in each of the relevant data_process sub-modules,
  • builds a dictionary of output tables (splitting tables as necessary),
  • fills tables with columns of default values as appropriate,
  • filters output tables to only the columns expected by PowerSimData, and
  • saves all files to the specified output directory.

const.py gets the column default values.

Within transmission.py:

Within generators.py: changing the expected column names for latitude and longitude, to match the renaming in transmission.py.

Testing

Tested manually. Files are created in the right location, although there are a few issues with the outputs, which I think should be addressed in separate PRs to fix the lower-level functions:

  • Within branch.csv
    • Currently, lines (either present in the original HIFLD data or added to connect the minimum spanning tree) don't get from_bus_id or to_bus_id assigned in the branch table, only transformers do. Lines do have SUB_1_ID, SUB_2_ID, and VOLTAGE, which should be enough to uniquely identify a bus, since buses are created via the set of unique voltages of lines connected to a given substation. EDIT: done.
    • Dataframe indices aren't unique after the individual frames for the original lines, the minimum-spanning-tree lines, and the transformers are appended together. EDIT: fixed via fix: various issues with HIFLD transmission loading/processing #239.

Tested via the test branch daniel/hifld_top_level_rebased, which combines the code from #236 and #240. Validation uses draft code from the branch for Breakthrough-Energy/PowerSimData#566.

# Building the hifld grid
# This step currently takes 20-30 minutes to download/process all data
from prereise.gather.griddata.hifld import create_csvs
create_csvs("path/to/powersimdata/network/hifld/data")

# Loading and checking the hifld grid
from powersimdata import Grid
from powersimdata.input.check import check_grid
check_grid(Grid("USA", "hifld"))
check_grid(Grid("Eastern", "hifld"))
check_grid(Grid("Western", "hifld"))
check_grid(Grid("ERCOT", "hifld"))
check_grid(Grid(["Eastern", "Western"], "hifld"))
check_grid(Grid(["Eastern", "ERCOT"], "hifld"))
check_grid(Grid(["ERCOT", "Western"], "hifld"))

The whole script takes a while to run (about 20 minutes on my laptop), even with a cached minimum spanning tree, since we don't cache the EPA AMPD data that's downloaded or the heat rate curves that are fitted to these data, along other things. We could improve at least part of this with some KDTree refactors.

Time estimate

30-60 minutes for what's in here currently. Longer if we want to add some caching to speed up performance and/or design pass-throughs to the keyword arguments of build_transmission and build_plant.

@danielolsen danielolsen added the hifld Related to ingestion of the HIFLD data label Nov 1, 2021
@danielolsen danielolsen self-assigned this Nov 1, 2021
@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch 3 times, most recently from 691cf38 to 8350004 Compare November 2, 2021 21:42
@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch from 8350004 to b99ef4c Compare November 9, 2021 22:14
@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch 3 times, most recently from 442a44e to 8ceef91 Compare November 22, 2021 17:23
@danielolsen
Copy link
Contributor Author

danielolsen commented Nov 22, 2021

This code has been tested to successfully make the CSVs necessary to create a grid, once it's rebased onto the code from #240, and has been validated using code from Breakthrough-Energy/PowerSimData#566. The original post has also been updated with what this branch currently does. Therefore, I'm converting this PR out of draft stage.

@danielolsen danielolsen marked this pull request as ready for review November 22, 2021 18:10
@danielolsen
Copy link
Contributor Author

After tweaking the output plant CSV a bit more (adding type, adding heat rate columns, translating to PowerSimData naming expectations), we can also successfully call powersimdata.input.export_data.export_case_mat on the resulting grids.

@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch from 3e0724b to 1060315 Compare November 23, 2021 22:50
@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch from 1060315 to 79c4e35 Compare November 23, 2021 22:52
@kasparm
Copy link
Contributor

kasparm commented Nov 24, 2021

Encountered this warning from scipy:

/Users/kmueller/.local/share/virtualenvs/PreREISE-X1G9x7gr/lib/python3.8/site-packages/scipy/stats/_stats_mstats_common.py:170: RuntimeWarning: invalid value encountered in double_scalars
  slope = ssxym / ssxm
/Users/kmueller/.local/share/virtualenvs/PreREISE-X1G9x7gr/lib/python3.8/site-packages/scipy/stats/_stats_mstats_common.py:184: RuntimeWarning: invalid value encountered in sqrt
  t = r * np.sqrt(df / ((1.0 - r + TINY)*(1.0 + r + TINY)))
/Users/kmueller/.local/share/virtualenvs/PreREISE-X1G9x7gr/lib/python3.8/site-packages/scipy/stats/_stats_mstats_common.py:187: RuntimeWarning: invalid value encountered in double_scalars
  slope_stderr = np.sqrt((1 - r**2) * ssym / ssxm / df)

I assume it is not critical.

@kasparm kasparm self-requested a review November 24, 2021 03:07
Copy link
Contributor

@kasparm kasparm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First check check_grid(Grid("USA", "hifld")) passes. Second check fails:

>>> check_grid(Grid("Eastern", "hifld"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kmueller/.local/share/virtualenvs/PreREISE-X1G9x7gr/lib/python3.8/site-packages/powersimdata/input/check.py", line 50, in check_grid
    raise ValueError(f"Problem(s) found with grid:\n{collected}")
ValueError: Problem(s) found with grid:
This grid contains 920 connected components, but is specified as having 1 interconnects: ['Eastern'].

Debugging:

gg = nx.from_pandas_edgelist(g_e.branch, "from_bus_id", "to_bus_id")
>>> num_connected_components = len([c for c in nx.connected_components(gg)])
>>> num_connected_components
920

@danielolsen
Copy link
Contributor Author

@kasparm the first issue is from the generator heat rate curve-fitting, it doesn't seem to be a problem. The second issue was from a commit that I missed when cherry-picking from my testing branch, fixed with the latest push.

@kasparm
Copy link
Contributor

kasparm commented Nov 29, 2021

For some reason I'm still getting the same error.

@danielolsen
Copy link
Contributor Author

For some reason I'm still getting the same error.

Even after running the latest PreREISE code and putting the resulting CSVs into the right folder in PowerSimData? The source of the error is the interconnect column not being filled, so that PowerSimData filters out all NA values when filtering to less than the full USA, which leads to many different islands once the interconnect-less transformers are removed. The most recent commit should solve this problem.

@kasparm
Copy link
Contributor

kasparm commented Nov 29, 2021

Looks good now. My installation was off.

Copy link
Contributor

@kasparm kasparm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @danielolsen looks good. The only change I would suggest is to place the create_csv function into it's on file in hifld and then import it in the init.py

@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch from ccd9069 to 6678d50 Compare November 30, 2021 00:24
@kasparm
Copy link
Contributor

kasparm commented Nov 30, 2021

Looks good. Thanks for making the change.

@danielolsen danielolsen force-pushed the daniel/hifld_top_level branch from 6678d50 to f231e87 Compare November 30, 2021 17:39
@danielolsen danielolsen merged commit edcc47a into hifld Nov 30, 2021
@danielolsen danielolsen deleted the daniel/hifld_top_level branch November 30, 2021 17:45
danielolsen added a commit that referenced this pull request Dec 8, 2021
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Jan 5, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Jan 8, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Jan 31, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Feb 25, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Mar 15, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Apr 1, 2022
feat: add top-level HIFLD grid orchestration function
danielolsen added a commit that referenced this pull request Apr 5, 2022
feat: add top-level HIFLD grid orchestration function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hifld Related to ingestion of the HIFLD data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants