Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

danielolsen · 2021-05-18T23:12:03Z

🚀

Is your feature request essential for your project?

Describe the workflow you want to enable

I wish I could take profile dataframes/CSVs (demand, hydro, solar, wind, indexed by timestamp, columns are zones/generators as appropriate) and a mapping of timestamps to timepoints to create the temporal input files required by Switch. This is one part of Issues #11, #12, and #14; the other part is generating the mapping. For now, let us assume we have a mapping (we may start by using a hardcoded one for now) and want to apply it.

Describe your proposed implementation

We should create four new internal functions to perform this mapping, plus one user-facing function which launches all of them. If the user provides:

A mapping of hourly timestamps (corresponding to input profiles) to timepoints (probably as a pandas Series, or as an equivalent dict)
Names for each timepoint
Data frames for each profile (or paths to them)

Then we can generate each of the required input files:

loads.csv is created via the demand dataframe/file and the mapping of timestamps to timepoints. There is one row for each combination of bus_id (column is named LOAD_ZONE) and timepoint (column is named TIMEPOINT), and the value is the demand at that bus during that timepoint (column named zone_demand_mw). The logic behind how we collapse many timestamp values to a single value is currently unknown: we could start with a simple mean and iterate as necessary.
timepoints.csv is created via the timepoints and their names (looking at the example input file, we have an index column named timepoint_id, a column named timestamp with a unique name for each timepoint, although this does not appear to be used elsewhere, and a column named timeseries which is not unique).
timeseries.csv is created using the names in the timeseries column of timepoints.csv (this column is named TIMESERIES), a year (column name is ts_period, in the example file all entries are 2030), a duration (this column is named ts_duration_of_tp, in the example file all entries are 6), the number of entries of this name within the timeseries column of timepoints.csv (this column is named ts_num_tps, in the example file all entries are 4), and the number of times that this timepoint was mapped to a timestamp in the original data (this column is named ts_scale_to_period).
variable_capacity_factors.txt is created using the hydro, solar, and wind dataframes/files and the mapping of timestamps to timepoints. There is one row for each combination of generator (current or hypothetical, columns is named GENERATION_PROJECT) and timepoint (column is named timepoint), and the value is the generation from that variable generator at that time point, normalized by the generator capacity (i.e. all values are in the range [0, 1]), and this column is named gen_max_capacity_factor.

The text was updated successfully, but these errors were encountered:

YifanLi86 · 2021-05-20T02:41:35Z

See if this is useful for you:

import os, sys, csv
import pandas as pd
import numpy as np

from PowerSimData2Switch import inv_period, grid, scenario, scenario_sy, grid_sy
demand = scenario.state.get_demand()
solar = scenario.state.get_solar()
wind = scenario.state.get_wind()
hydro = scenario.state.get_hydro()

from powersimdata.scenario.scenario import Scenario
demand_sy = scenario_sy.state.get_demand()

pre-populate slicing_recovery_file

slicing = []

Processing demand data slicing.

demand_sy_tps = demand_sy.join(slicing['timepoint']).pivot_table(index = 'timepoint')

Demand bus distribution factor calculation for each bus.

zone_pd = grid.bus.pivot_table(index = 'zone_id', aggfunc = np.sum)['Pd'].to_frame()
grid_bus_zone_pd = grid.bus.join(zone_pd, on = 'zone_id', lsuffix = '_zone_id')
distribution = grid_bus_zone_pd['Pd_zone_id']/grid_bus_zone_pd['Pd']

write loads.csv file.

with open('loads.csv', 'w', newline='') as loads_file:
writer = csv.writer(loads_file)
writer.writerow(["LOAD_ZONE", "TIMEPOINT", "zone_demand_mw"])

for i in distribution.index:
    for j in range(24):
        writer.writerow([i, j+1, demand_sy_tps[grid.bus['zone_id'][i]][j+1] * distribution[i]])

write variable_capacity_factors.csv file.

solar_tps = solar.join(slicing['timepoint']).pivot_table(index = 'timepoint')
wind_tps = wind.join(slicing['timepoint']).pivot_table(index = 'timepoint')
hydro_tps = hydro.join(slicing['timepoint']).pivot_table(index = 'timepoint')
variable_gen_tps = pd.concat([solar_tps, wind_tps, hydro_tps], axis = 1)

with open('variable_capacity_factors.csv', 'w', newline='') as variable_capacity_factor_file:
writer = csv.writer(variable_capacity_factor_file)
writer.writerow(["GENERATION_PROJECT", "timepoint", "gen_max_capacity_factor"])

for i in grid.plant.index:
    for j in range(24):
        if grid.plant['type'][i] in ("hydro", "wind", "solar", "wind_offshore"):
            if grid.plant['Pmax'][i] == 0:
                writer.writerow([''.join(["g", str(i), "i"]), j+1, 0])
                writer.writerow([''.join(["g", str(i)]), j+1, 0])
            else:
                writer.writerow([''.join(["g", str(i), "i"]), j+1, 
                                 variable_gen_tps[i][j+1] / grid.plant['Pmax'][i]])
                writer.writerow([''.join(["g", str(i)]), j+1, 
                                 variable_gen_tps[i][j+1] / grid.plant['Pmax'][i]])

danielolsen · 2021-05-20T15:14:06Z

@YifanLi86 your code snippet relies on an unknown script/module PowerSimData2Switch. Trying to import the version from your PR in our switch fork results in an overflow error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\DanielOlsen\Dropbox (Gates Ventures)\BreakthroughEnergySciences\Explorations\DanielO\Capacity Expansion\PowerSimData2Switch.py", line 307, in <module>
    if AveFuelCost(df_ave_fuel_cost.index[j], fuel[k], inv_period[i]) == 0:
  File "C:\Users\DanielOlsen\Dropbox (Gates Ventures)\BreakthroughEnergySciences\Explorations\DanielO\Capacity Expansion\PowerSimData2Switch.py", line 176, in AveFuelCost
    ave_fuel_cost = ave_fuel_cost * (interest_rate**(int(year) - int(base_year)))
OverflowError: (34, 'Result too large')

Maybe you have a different working version in your local machine.

Looking at the rest of the code, we get another error with this line:

>>> demand.join(slicing['timepoint']).pivot_table(index="timepoint")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str

since slicing is still an empty list.

Looking at the structure of the temporal CSVs, it seems like they will be pretty straightforward to construct. The biggest open question I had when I wrote the Issue is what MW value we pick for a timepoint that represents multiple timestamps, but it looks like you use pandas.DataFrame.pivot_table method with no aggfunc parameter, which results in a simple mean by default, so I will proceed with that for now.

YifanLi86 · 2021-05-20T17:45:16Z

@danielolsen The PR code is exactly what I am running with....the overflow error "results too large" is something that I have not seen. A little bit weird, but I will look into it. Or you can see what variable it needs, inv_period, grid, scenario, scenario_sy, grid_sy are all needed from the grid, so just pre-populate you should be able to by pass.

The slicing variable need to be pre-populated based on the csv format. So the way I am doing things now is to manually develop a slicing method, and then read it into memory, then run rest of the code line-by-line.

This is the best I have so far, it is broken but I thought it has some logics on large file generating so just shared per your request. I hope if can have some help but if not, just ignore. In the PR code I mention "# Refer to TemporalSlicing.py for time series related input data development.", but this is just some placeholder that I need to work on this in the next. Thinking about finishing the prototype last week but but did not actually get a chance to finish.

danielolsen · 2021-05-20T17:59:42Z

The code was helpful in answering my question of how to combine many timestamps into a single value, since it showed me that you're using the default mean aggregation of the pandas pivot_table method. I should be able to implement the features to satisfy this issue using the sample mapping you've provided, and I've created a new issue to consolidate discussion on how to generate the mapping: see #60.

danielolsen · 2021-05-20T20:14:17Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Would they potentially have different mappings, or would mappings for all years be identical?

YifanLi86 · 2021-05-20T20:40:48Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical? Identical

danielolsen · 2021-05-20T21:30:41Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

YifanLi86 · 2021-05-20T21:36:44Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

danielolsen · 2021-05-20T21:38:38Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

In that case, it seems like the ts_period column of timeseries.csv must be part of the input from the user, unless we want to automatically parse 2032_x to 2032 etc.

YifanLi86 · 2021-05-20T21:47:23Z

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

In that case, it seems like the ts_period column of timeseries.csv must be part of the input from the user, unless we want to automatically parse 2032_x to 2032 etc.

Yes, currently we have been focusing on a single year operation, single year investment model. But it definitely needs user specification for multiple years or single year that is not 2030.

danielolsen · 2021-05-20T21:58:05Z

It sounds like we will need a validation step between the timeseries inputs from the user and the information that we prompt the user for in switchwrapper.grid_to_switch.get_inv_periods (or wherever what's in there now is moved to) to make sure that we have temporal mapping for all the years that the use wants to consider.

danielolsen · 2021-05-27T00:47:09Z

Closed via #61, #63, and #64.

danielolsen added the feature request label May 18, 2021

danielolsen assigned danielolsen, rouille, BainanXia, jenhagg and ahurli May 18, 2021

danielolsen mentioned this issue May 19, 2021

feat: add skeleton for applying temporal mapping to inputs #56

Merged

danielolsen mentioned this issue May 20, 2021

Add function(s) to create timestamp-to-timepoint mapping #60

Open

1 task

danielolsen mentioned this issue May 20, 2021

feat: add logic to build_loads function #61

Merged

This was referenced May 20, 2021

feat: add logic to build_timeseries function #63

Merged

feat: add logic to build_variable_capacity_factors function #64

Merged

danielolsen closed this as completed May 27, 2021

rouille added the v0.1 label May 28, 2021

rouille added this to the Welcome interns! milestone May 29, 2021

This was referenced Jun 1, 2021

chore: merge develop into main for v0.1 release #90

Merged

Add a feature to apply spatial reduction to an input Grid #92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

danielolsen commented May 18, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021 •

edited

Loading

danielolsen commented May 20, 2021

danielolsen commented May 27, 2021

Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

Comments

danielolsen commented May 18, 2021

🚀

Describe the workflow you want to enable

Describe your proposed implementation

YifanLi86 commented May 20, 2021

pre-populate slicing_recovery_file

Processing demand data slicing.

Demand bus distribution factor calculation for each bus.

write loads.csv file.

write variable_capacity_factors.csv file.

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021

danielolsen commented May 20, 2021

YifanLi86 commented May 20, 2021 • edited Loading

danielolsen commented May 20, 2021

danielolsen commented May 27, 2021

YifanLi86 commented May 20, 2021 •

edited

Loading