feat: add function to assign demand to buses proportional to population #235

danielolsen · 2021-10-29T19:07:57Z

Purpose

Estimates population per substation, using data from simplemaps.com, and uses this information to add demand to buses. This is a re-implementation of the approach taken by the collaborators. Closes #229.

What the code is doing

We add a new module prereise.gather.griddata.hifld.data_process.demand with a single function assign_demand_to_buses. This function:

Loads population by ZIP and by county from CSVs in the repo.
Sorts substations by connected transmission capacity.
Distributes each ZIP code's population to the N highest-transmission-capacity non-generation substations within that ZIP, where N is an integer value less than the total (calculated using the substation_load_share fraction within const.py) but no less than one, unless there are no substations within that ZIP. These are considered as 'load substations'.
Determines which counties have no substations with ZIP-assigned demand ('load substations'), and for each of these, picks the substation in the county with greatest transmission capacity to add to the set of load substations.
Determines how much of each county's population is not yet assigned, and distributes this to the load substations within the county. Note: if a county has no substations at all, its population does not go anywhere else, and is effectively ignored. Less than 5% of population is ignored this way, so it should not have a large effect on the overall demand distribution.
Translates total population (from ZIP-assignment and county assignment) to demand using an assumption of demand-per-person.
Selects the lowest-voltage bus within each substation to assign demand to.

Besides demand.py, all other changes are data/documentation.

Testing

Tested manually.

Usage Example/Visuals

import pandas as pd
from prereise.gather.griddata.hifld.data_process.demand import assign_demand_to_buses
from prereise.gather.griddata.hifld.data_process.generators import build_plant
from prereise.gather.griddata.hifld.data_process.transmission import (
    build_transmission,
    calculate_branch_mileage,
    create_buses,
    create_transformers,
    estimate_branch_impedance,
    estimate_branch_rating,
)

# Invoking highest-level `data_process` functions
lines, substations = build_transmission(method="line2sub")
bus = create_buses(lines)
generators = build_plant(bus, substations)

# This code has been demonstrated via other PRs, but hasn't been baked into the top-level data process functions yet
lines["type"] = "Line"
lines["length"] = lines.apply(calculate_branch_mileage, axis=1)
transformers = create_transformers(bus)
transformers["type"] = "Transformer"
branch = pd.concat([lines, transformers])
branch["x"] = branch.apply(lambda x: estimate_branch_impedance(x, bus["baseKV"]), axis=1)
branch["rateA"] = branch.apply(lambda x: estimate_branch_rating(x, bus["baseKV"]), axis=1)

# New code
assign_demand_to_buses(substations, branch, generators, bus)

A "Pd" column is added inplace to the bus dataframe:

>>> # Estimated 2.01 kW per person within code, 315.8 million people assigned to buses
>>> bus["Pd"].sum() / 2.01e-3
315800868.4704517
>>> bus["Pd"].isna().sum()
0

We need the generators to be able to preferentially assign demand to non-generator buses, and we need branch capacities to preferentially assign demand to higher-capacity substations when multiple substations are available within an area (ZIP or county). Generating the transformers and estimating branch impedances and capacities should probably be added to build_transmission as part of fulfillment of #226.

I demonstrate using the "line2sub" method since I have more faith in line coordinates than line substation names, based on some exploration with DC lines: #233 (comment). I suggest that we also switch to this as default as part of #226.

Time estimate

1 hour. The code itself isn't too long, but it's some fairly dense pandas.

ATTRIBUTION.md

prereise/gather/griddata/hifld/data_process/demand.py

BainanXia · 2021-10-29T21:57:50Z

prereise/gather/griddata/hifld/data_process/demand.py

+    filtered_branch = branch.query("SUB_1_ID != SUB_2_ID")
+    from_cap = filtered_branch.groupby("SUB_1_ID").sum()["rateA"]
+    to_cap = filtered_branch.groupby("SUB_2_ID").sum()["rateA"]
+    sub_cap = from_cap.combine(to_cap, lambda x, y: x + y, fill_value=0)


I was thinking about refactoring calculate_substation_capacity function in powersimdata/design/transmission/substations.py, which currently takes a grid object as input, but failed to come up with a compatible idea given the column names are different as well...Let's keep these lines here then.

BainanXia · 2021-10-29T22:18:59Z

prereise/gather/griddata/hifld/data_process/demand.py

+    subs_per_zip = filtered_subs.value_counts("ZIP")
+    zip_load_substations = subs_per_zip * const.substation_load_share
+    zip_load_substations = zip_load_substations.round().clip(lower=1)
+    zip_assigned_population = (zip_data["population"] / zip_load_substations).dropna()


When I was reading through the reference code from collaborators, I was thinking whether we should distribute load (population) proportional to the substation capacities instead of uniformly. Do you think that will make a difference?

It will definitely make some kind of difference, but I'm not not sure in which general direction the difference will be. Thinking about a low-medium density area, I would imagine that the population is fairly evenly spread out, and so distributing uniformly probably makes sense, since a high-capacity substation may just be a collector, not in response to a pocket of density. However, in an area that truly has different densities, higher-capacity substations may actually be in reaction to higher density. Without further information, I'm not sure what the best conclusion is.

If we run a simulation with this approach and find issues, then I think we may need to revisit this question. I think it's also potentially related to #234; we may want to patch the algorithm, or we may alternatively want to patch the data that feeds the algorithm.

prereise/gather/griddata/hifld/data_process/demand.py

BainanXia

Very nice. Thanks!

danielolsen · 2021-10-29T23:23:48Z

I chatted with @rouille today, and he raised a suggestion: should these CSVs live in the blob storage, rather than the repo?

BainanXia · 2021-10-29T23:30:10Z

I chatted with @rouille today, and he raised a suggestion: should these CSVs live in the blob storage, rather than the repo?

Good call. Looking at the size of the two files, <10M in total. I would say it won't hurt to leave it in the repo for now so that it is easier to move around. But it is also clean put them in the blob storage if we have a folder structure in mind (we still have decent amount of files in the repo now). Again, your choice.

danielolsen · 2021-11-02T19:03:44Z

This feature has been refactored to get the county and ZIP data files from blob storage, rather than from the repo itself. The new call signature (including changes to build_transmission via #237):

from prereise.gather.griddata.hifld.data_process.demand import assign_demand_to_buses
from prereise.gather.griddata.hifld.data_process.generators import build_plant
from prereise.gather.griddata.hifld.data_process.transmission import build_transmission
branch, bus, sub, dcline = build_transmission()
generators = build_plant(bus, sub)
assign_demand_to_buses(sub, branch, generators, bus)

…t.py

feat: add function to assign demand to buses proportional to population

danielolsen added the hifld Related to ingestion of the HIFLD data label Oct 29, 2021

danielolsen requested review from danlivengood, BainanXia, jenhagg, ahurli and YifanLi86 October 29, 2021 19:07

danielolsen self-assigned this Oct 29, 2021