Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a feature to apply timestamp-to-timepoint mappings to profiles to create input temporal data #55

Closed
1 task done
danielolsen opened this issue May 18, 2021 · 12 comments

Comments

@danielolsen
Copy link
Contributor

🚀

  • Is your feature request essential for your project?

Describe the workflow you want to enable

I wish I could take profile dataframes/CSVs (demand, hydro, solar, wind, indexed by timestamp, columns are zones/generators as appropriate) and a mapping of timestamps to timepoints to create the temporal input files required by Switch. This is one part of Issues #11, #12, and #14; the other part is generating the mapping. For now, let us assume we have a mapping (we may start by using a hardcoded one for now) and want to apply it.

Describe your proposed implementation

We should create four new internal functions to perform this mapping, plus one user-facing function which launches all of them. If the user provides:

  • A mapping of hourly timestamps (corresponding to input profiles) to timepoints (probably as a pandas Series, or as an equivalent dict)
  • Names for each timepoint
  • Data frames for each profile (or paths to them)

Then we can generate each of the required input files:

  • loads.csv is created via the demand dataframe/file and the mapping of timestamps to timepoints. There is one row for each combination of bus_id (column is named LOAD_ZONE) and timepoint (column is named TIMEPOINT), and the value is the demand at that bus during that timepoint (column named zone_demand_mw). The logic behind how we collapse many timestamp values to a single value is currently unknown: we could start with a simple mean and iterate as necessary.
  • timepoints.csv is created via the timepoints and their names (looking at the example input file, we have an index column named timepoint_id, a column named timestamp with a unique name for each timepoint, although this does not appear to be used elsewhere, and a column named timeseries which is not unique).
  • timeseries.csv is created using the names in the timeseries column of timepoints.csv (this column is named TIMESERIES), a year (column name is ts_period, in the example file all entries are 2030), a duration (this column is named ts_duration_of_tp, in the example file all entries are 6), the number of entries of this name within the timeseries column of timepoints.csv (this column is named ts_num_tps, in the example file all entries are 4), and the number of times that this timepoint was mapped to a timestamp in the original data (this column is named ts_scale_to_period).
  • variable_capacity_factors.txt is created using the hydro, solar, and wind dataframes/files and the mapping of timestamps to timepoints. There is one row for each combination of generator (current or hypothetical, columns is named GENERATION_PROJECT) and timepoint (column is named timepoint), and the value is the generation from that variable generator at that time point, normalized by the generator capacity (i.e. all values are in the range [0, 1]), and this column is named gen_max_capacity_factor.
@YifanLi86
Copy link
Contributor

See if this is useful for you:

import os, sys, csv
import pandas as pd
import numpy as np

from PowerSimData2Switch import inv_period, grid, scenario, scenario_sy, grid_sy
demand = scenario.state.get_demand()
solar = scenario.state.get_solar()
wind = scenario.state.get_wind()
hydro = scenario.state.get_hydro()

from powersimdata.scenario.scenario import Scenario
demand_sy = scenario_sy.state.get_demand()

pre-populate slicing_recovery_file

slicing = []

Processing demand data slicing.

demand_sy_tps = demand_sy.join(slicing['timepoint']).pivot_table(index = 'timepoint')

Demand bus distribution factor calculation for each bus.

zone_pd = grid.bus.pivot_table(index = 'zone_id', aggfunc = np.sum)['Pd'].to_frame()
grid_bus_zone_pd = grid.bus.join(zone_pd, on = 'zone_id', lsuffix = '_zone_id')
distribution = grid_bus_zone_pd['Pd_zone_id']/grid_bus_zone_pd['Pd']

write loads.csv file.

with open('loads.csv', 'w', newline='') as loads_file:
writer = csv.writer(loads_file)
writer.writerow(["LOAD_ZONE", "TIMEPOINT", "zone_demand_mw"])

for i in distribution.index:
    for j in range(24):
        writer.writerow([i, j+1, demand_sy_tps[grid.bus['zone_id'][i]][j+1] * distribution[i]])

write variable_capacity_factors.csv file.

solar_tps = solar.join(slicing['timepoint']).pivot_table(index = 'timepoint')
wind_tps = wind.join(slicing['timepoint']).pivot_table(index = 'timepoint')
hydro_tps = hydro.join(slicing['timepoint']).pivot_table(index = 'timepoint')
variable_gen_tps = pd.concat([solar_tps, wind_tps, hydro_tps], axis = 1)

with open('variable_capacity_factors.csv', 'w', newline='') as variable_capacity_factor_file:
writer = csv.writer(variable_capacity_factor_file)
writer.writerow(["GENERATION_PROJECT", "timepoint", "gen_max_capacity_factor"])

for i in grid.plant.index:
    for j in range(24):
        if grid.plant['type'][i] in ("hydro", "wind", "solar", "wind_offshore"):
            if grid.plant['Pmax'][i] == 0:
                writer.writerow([''.join(["g", str(i), "i"]), j+1, 0])
                writer.writerow([''.join(["g", str(i)]), j+1, 0])
            else:
                writer.writerow([''.join(["g", str(i), "i"]), j+1, 
                                 variable_gen_tps[i][j+1] / grid.plant['Pmax'][i]])
                writer.writerow([''.join(["g", str(i)]), j+1, 
                                 variable_gen_tps[i][j+1] / grid.plant['Pmax'][i]])

@danielolsen
Copy link
Contributor Author

@YifanLi86 your code snippet relies on an unknown script/module PowerSimData2Switch. Trying to import the version from your PR in our switch fork results in an overflow error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\DanielOlsen\Dropbox (Gates Ventures)\BreakthroughEnergySciences\Explorations\DanielO\Capacity Expansion\PowerSimData2Switch.py", line 307, in <module>
    if AveFuelCost(df_ave_fuel_cost.index[j], fuel[k], inv_period[i]) == 0:
  File "C:\Users\DanielOlsen\Dropbox (Gates Ventures)\BreakthroughEnergySciences\Explorations\DanielO\Capacity Expansion\PowerSimData2Switch.py", line 176, in AveFuelCost
    ave_fuel_cost = ave_fuel_cost * (interest_rate**(int(year) - int(base_year)))
OverflowError: (34, 'Result too large')

Maybe you have a different working version in your local machine.

Looking at the rest of the code, we get another error with this line:

>>> demand.join(slicing['timepoint']).pivot_table(index="timepoint")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: list indices must be integers or slices, not str

since slicing is still an empty list.

Looking at the structure of the temporal CSVs, it seems like they will be pretty straightforward to construct. The biggest open question I had when I wrote the Issue is what MW value we pick for a timepoint that represents multiple timestamps, but it looks like you use pandas.DataFrame.pivot_table method with no aggfunc parameter, which results in a simple mean by default, so I will proceed with that for now.

@YifanLi86
Copy link
Contributor

@danielolsen The PR code is exactly what I am running with....the overflow error "results too large" is something that I have not seen. A little bit weird, but I will look into it. Or you can see what variable it needs, inv_period, grid, scenario, scenario_sy, grid_sy are all needed from the grid, so just pre-populate you should be able to by pass.

The slicing variable need to be pre-populated based on the csv format. So the way I am doing things now is to manually develop a slicing method, and then read it into memory, then run rest of the code line-by-line.

This is the best I have so far, it is broken but I thought it has some logics on large file generating so just shared per your request. I hope if can have some help but if not, just ignore. In the PR code I mention "# Refer to TemporalSlicing.py for time series related input data development.", but this is just some placeholder that I need to work on this in the next. Thinking about finishing the prototype last week but but did not actually get a chance to finish.

@danielolsen
Copy link
Contributor Author

The code was helpful in answering my question of how to combine many timestamps into a single value, since it showed me that you're using the default mean aggregation of the pandas pivot_table method. I should be able to implement the features to satisfy this issue using the sample mapping you've provided, and I've created a new issue to consolidate discussion on how to generate the mapping: see #60.

@danielolsen
Copy link
Contributor Author

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Would they potentially have different mappings, or would mappings for all years be identical?

@YifanLi86
Copy link
Contributor

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical? Identical

@danielolsen
Copy link
Contributor Author

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

@YifanLi86
Copy link
Contributor

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

@danielolsen
Copy link
Contributor Author

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

In that case, it seems like the ts_period column of timeseries.csv must be part of the input from the user, unless we want to automatically parse 2032_x to 2032 etc.

@YifanLi86
Copy link
Contributor

YifanLi86 commented May 20, 2021

@YifanLi86 the sample timeseries.csv file has a column ts_period, for which all entries are 2030. If we had multiple periods, would each combination of (timeseries, ts_period) get its own row? Yes
Would they potentially have different mappings, or would mappings for all years be identical?

Identical

For other years, do we have more timepoints? It seems that in timepoints.csv, each timepoint is mapped to a timeseries, and the timeseries are names following the pattern 2030_x. Would timepoints in another year still map to these same timepoints, or would they map to new timepoints that are labelled e.g. 2032_x, 2034_x?

They have to go to 2032_x or 2034_x.

In that case, it seems like the ts_period column of timeseries.csv must be part of the input from the user, unless we want to automatically parse 2032_x to 2032 etc.

Yes, currently we have been focusing on a single year operation, single year investment model. But it definitely needs user specification for multiple years or single year that is not 2030.

@danielolsen
Copy link
Contributor Author

It sounds like we will need a validation step between the timeseries inputs from the user and the information that we prompt the user for in switchwrapper.grid_to_switch.get_inv_periods (or wherever what's in there now is moved to) to make sure that we have temporal mapping for all the years that the use wants to consider.

@danielolsen
Copy link
Contributor Author

Closed via #61, #63, and #64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants