Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: separate interconnections via network topology #240

Merged
merged 8 commits into from
Nov 23, 2021

Conversation

danielolsen
Copy link
Contributor

@danielolsen danielolsen commented Nov 16, 2021

Pull Request doc

Purpose

  • Load substations whose status is either IN SERVICE or NOT AVAILABLE. Although NOT AVAILABLE substations are only about 5% of the total, they're about 20% of the substations within Texas and New Mexico, which causes lines to get mapped to faraway substations which are sometimes in other interconnections.
  • Add a function which can separate and label the interconnect attribute of lines and substations, given input on where to split the substations and how to re-join them.
  • Add data that when input into the new function, splits the HIFLD network into three interconnections. This data is imperfect, especially regarding the Eastern/ERCOT seam around Lubbock and the Texas panhandle more broadly, but it demonstrates how the new function works and should be able to be revised later without any code changes.

Closes #233.

What the code is doing

  • const.py gets the (imperfect) information which splits the interconnection into different networks.
  • topology.py gets the new function add_interconnects_by_connected_components that:
    • Builds a NetworkX graph of the network once a subset of substations (and all lines connected to them) are removed.
    • Labels the lines of the connected components based on assumptions of size (Eastern > Western > ERCOT).
    • Labels lines based on user-supplied information (assumptions for lines & substations).
    • Labels lines based on the interconnects of their neighbors at non-dropped substations.
    • Using the interconnect labels for lines, identify which interconnects are present within substations that get split, and create new pseudo-substations for each interconnect within a substation, and re-connect lines as necessary. New pseudo-substations get added, and substations which got split are removed.
    • Labels the substations based on the new topology.
  • transmission.py uses this new function to assign interconnects to lines & substations, rather than the assumptions by county as previously.

Testing

Tested manually:

from prereise.gather.griddata.hifld.data_process.transmission import build_transmission
branch, bus, substations, dclines = build_transmission()

Validation:

>>> import networkx as nx
>>> g = nx.convert_matrix.from_pandas_edgelist(
...     branch.query("not SUB_1_ID.isnull()"), "SUB_1_ID", "SUB_2_ID",
... )
>>> interconnects = list(nx.connected_components(g))
>>> sorted_interconnects = sorted(interconnects, key=len)[::-1]
>>> for s in sorted_interconnects:
...     print(f"connected component with {len(s)} substations")
...     print("substations:", substations.loc[s].value_counts("interconnect"))
...     print("lines:", branch.query("SUB_1_ID in @s or SUB_2_ID in @s").value_counts("interconnect"))
...
connected component with 45141 substations
substations: interconnect
Eastern    45141
dtype: int64
lines: interconnect
Eastern    62046
dtype: int64
connected component with 12983 substations
substations: interconnect
Western    12983
dtype: int64
lines: interconnect
Western    17910
dtype: int64
connected component with 3352 substations
substations: interconnect
ERCOT    3352
dtype: int64
lines: interconnect
ERCOT    4980
dtype: int64

Usage Example/Visuals

Here are the transmission lines for the states required to define the interconnections, colored by interconnect, with tiny yellow stars at the location of the dropped substations:
map_all_substations_reconnected_reduced

Time estimate

30-60 minutes for the current code/logic. I'm also going to add some tests and try to factor out some of the for loops that were added for convenience but could probably be equivalently done in a cleaner way, although the elapsed time for these steps is quite short so it doesn't need too much optimization.

@danielolsen danielolsen added the hifld Related to ingestion of the HIFLD data label Nov 16, 2021
@danielolsen danielolsen self-assigned this Nov 16, 2021
@danielolsen danielolsen force-pushed the daniel/hifld_interconnects branch 2 times, most recently from 1a865f3 to 65a3947 Compare November 16, 2021 22:52
non_dropped_lines = lines.loc[labels != "dropped"]
for id, line in dropped_lines.iterrows():
non_dropped_sub = ( # noqa: F841
line.SUB_1_ID if line.SUB_2_ID in seams_substations else line.SUB_2_ID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is guaranteed there are no lines with both ends in seams substations, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not guaranteed, in fact it does happen in our data set! The path of connection from the Eastern interconnect to the Oklaunion B2B station appears to go from a full-Eastern substation through another intermediate mixed-interconnect substation first before getting to the B2B substation, so both the B2B substation and the 'intermediate' are considered to be 'part of the seam', since both have buses in both interconnects, and the line that connects the two has both ends 'on the seam'. With the input data we have right now, the interconnects still do end up getting separated properly, but there's definitely potential for this sort of edge case to bite us with different input data. This would also preclude your other refactor idea of always looking to the substation at the 'other end' to determine which interconnect a line should connect to within a dropped substation.

Comment on lines +297 to +299
new_substations["NAME"] = [
substations.loc[sub_id, "NAME"] + f"_{i}" for i in new_sub_interconnects
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping track of this using NAME field is nice, not sure whether it is more straight forward to inferring this using id directly, i.e. xxxx_1 and xxxx_2 where xxxx is the old sub id and the numbers are interconnect rankings. It is not necessarily to have consecutive numbers sub_ids right? We might need them to be integers though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need them to be integers though: I think you might be right, I'm not sure how much downstream code is implicitly expecting integers vs. just some sort of unique index.

Comment on lines +304 to +310
for line_id, line in sub_dropped_lines.iterrows():
for new_sub_id, sub in new_substations.iterrows():
if line["interconnect"] == sub["NAME"].split("_")[1]:
if line["SUB_1_ID"] == sub_id:
lines.loc[line_id, "SUB_1_ID"] = new_sub_id
if line["SUB_2_ID"] == sub_id:
lines.loc[line_id, "SUB_2_ID"] = new_sub_id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this could be simplified if we give an interconnect tag to substations first.

@BainanXia
Copy link
Collaborator

BainanXia commented Nov 18, 2021

Thanks for making all the wild ideas happen. Although I'm aware of the full story line via offline discussions, I still found the logic brain burning when reading through the code. Here is my two cents popping up upon review, not sure whether that would be clearer:

  • one function for splitting into certain number of islands given a list of chosen nodes dropped
  • one function for adding interconnect tag (column) to the nodes in each island based on the given interconnect size ranking
  • one function for making copies of dropped nodes based on the other ends of the lines connecting to it (assuming we don't have lines with both ends dropped), then reconnect the corresponding lines to the new substations, adding those nodes and edges back.

In this way, we don't need to label lines with interconnects but substations only. In the end, we could have a sanity check to see whether all lines with both ends locating into the same interconnection.

It is very likely I missed something. If that breaks anywhere of the critical logic flow, just forget it and never mind:)

@danielolsen
Copy link
Contributor Author

@BainanXia I like your plan of splitting add_interconnects_by_connected_components into smaller sub-functions. That would make it easier to follow the logic and easier to test as well. For your suggestion of refactoring the third part to not assign interconnects to lines, would that impact the results at all, or would it just be a (potentially simpler) way to accomplish the same thing in a different way?

@BainanXia
Copy link
Collaborator

@BainanXia I like your plan of splitting add_interconnects_by_connected_components into smaller sub-functions. That would make it easier to follow the logic and easier to test as well. For your suggestion of refactoring the third part to not assign interconnects to lines, would that impact the results at all, or would it just be a (potentially simpler) way to accomplish the same thing in a different way?

It's just an alternative way to achieve the same thing. I was thinking all we need is to tag every node with interconnections rather than edges via assumptions + neighbors. Specifically, if one node has neighbors with multiple interconnection tags (dropped nodes), we make copies and tag each copy with one interconnect among the set.

@danielolsen
Copy link
Contributor Author

B2B facility information has been added, as well as a function to add these B2B lines to the DC line table by looking at the now-separated substations. Therefore, this should now fulfill all remaining requirements of #233.

Copy link
Contributor

@kasparm kasparm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can reproduce your test and see that the B2B where added to the dclines.

@danielolsen danielolsen force-pushed the daniel/hifld_interconnects branch from 5896d9c to 37d1681 Compare November 23, 2021 21:58
@danielolsen danielolsen force-pushed the daniel/hifld_interconnects branch from 37d1681 to 14f8980 Compare November 23, 2021 22:03
@danielolsen danielolsen merged commit 45d51d7 into hifld Nov 23, 2021
@danielolsen danielolsen deleted the daniel/hifld_interconnects branch November 23, 2021 22:50
danielolsen added a commit that referenced this pull request Dec 8, 2021
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Jan 5, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Jan 8, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Jan 31, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Feb 25, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Mar 15, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Apr 1, 2022
…nnects

feat: separate interconnections via network topology
danielolsen added a commit that referenced this pull request Apr 5, 2022
…nnects

feat: separate interconnections via network topology
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hifld Related to ingestion of the HIFLD data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants