Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat, refactor: added new entry point to run call.py without PowerSimData #74

Merged
merged 3 commits into from
Sep 24, 2020

Conversation

ahurli
Copy link
Contributor

@ahurli ahurli commented Sep 14, 2020

Purpose

Remove dependency of call.py on PowerSimData, adding another entry point to call.py that is not a scenario ID.

What is the code doing?

This code now runs independently of PowerSim data by passing in flags to define the required and options parameters to run a simulation. The flags are documented below, which is the result of running python call.py -h or python call.py --help.

To maintain backwards compatibility (for now), I have been able to successfully run the following two commands in my local environment and get the Julia engine to run locally (though not complete because I'm impatient) after copying scenario 1220 data to my local and updating const.py accordingly:
python -u ./pyreisejl/utility/call.py 1220
python -u ./pyreisejl/utility/call.py 1220 5

As an example of how the code can now be run:
python ./pyreisejl/utility/call.py -s '2016-01-01 00:00:00' -e '2016-01-07 23:00:00' -int 24 -i /Users/annahurlimann/ScenarioData/test

I've also included some additional checks/input validation:

  • Enforcing that the interval passed in (-i, --interval) needs to be an integer
  • A printed warning if the interval given doesn't evenly divide the start and end time given, which will presumably result in the exclusion of data at the end of given date range (WARNING: The interval you chose does not evenly divide your date range.)
  • Requiring start_date, end_date, interval, input_dir if not using the PowerSimData notation (i.e. python -u ./pyreisejl/utility/call.py -s '2016-01-01 00:00:00' returns an WrongNumberOfArguments error).
usage: call.py [-h] [-s START_DATE] [-e END_DATE] [-int INTERVAL] [-i INPUT_DIR] [-o OUTPUT_DIR] [-t THREADS]
               [scenario_id] [powersim_threads]

Run REISE.jl simulation.

positional arguments:
  scenario_id           Scenario ID only if using PowerSimData.
  powersim_threads      Number of threads only if using PowerSimData.

optional arguments:
  -h, --help            show this help message and exit
  -s START_DATE, --start-date START_DATE
                        The start date for the simulation in format 'YYYY-MM-DD HH:MM:SS'
  -e END_DATE, --end-date END_DATE
                        The end date for the simulation in format 'YYYY-MM-DD HH:MM:SS'
  -int INTERVAL, --interval INTERVAL
                        The length of each interval in hours.
  -i INPUT_DIR, --input-dir INPUT_DIR
                        The directory containing the input data files. Required files are 'case.mat', 'demand.csv',
                        'hydro.csv', 'solar.csv', and 'wind.csv'.
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        The directory to store the results. This is optional and defaults to an output folder that will be
                        created in the input directory if it does not exist.
  -t THREADS, --threads THREADS
                        The number of threads to run the simulation with. This is optional and defaults to Auto.

Where to Look

The code changed is all in pyreise.utility.call.

Time to Review

15 min.

@ahurli ahurli requested review from rouille and jenhagg September 14, 2020 19:59
)


def _insert_in_file(filename, scenario_id, column_number, column_value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could update the docstring here, since it now also works on the scenario list. I suspect we'd move this to a helper module when refactoring extract_data since there is a duplicate of this in there, but it can wait, just an fyi if you didn't see that yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I didn't really make any changes here, so I kept the docstring as is, but I see no reason why not to update it.

As far as the refactoring of insert_in_file out, I still think ultimately we should remove the PowerSim dependency from here. Since a multi-repo PR seemed a little daunting while I'm still learning the workflow of the team, I wrote this so decoupling it would ultimately just mean deleting _get_scenario, _record_scenario, _insert_in_file, the arg definitions, and the if args.scenario_id statements (i.e. just deleting code but not necessarily writing anything new). My hope is to refactor extract_data, update PowerSimData to use this new call, then just delete these functions embedded here and in extract_data.

)

parser.add_argument(
"powersim_threads",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to combine the thread arguments and still have compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it causes some weird interactions which I'm not sure are desirable. Specifically, you can pass in 2 arguments defining the number of threads, and the second one will be applied.

E.g.
python -u ./pyreisejl/utility/call.py 88 12 -t 123 will set threads to 123 while python -u ./pyreisejl/utility/call.py -t 123 88 12 sets the threads arg to 12.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's too ugly to support both, adding a complementary PR in PowerSimData to specify threads with a -t flag should be very easy, and as long as we merge both PRs at the same time it won't disrupt our workflow at all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking it would be nice to only have one flag, but wasn't sure if we'd need to change powersimdata to achieve that. I'm fine with merging this now and doing the complementary PRs as a follow up, or adding onto this, whatever is easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yeah I was trying to avoid a multi-repo PR first thing. (See above comment re: 'multi-repo PR'). Basically, I wrote this such that my plan was to maintain backwards compatibility with code that was easily deletable (i.e. anything referencing PowerSimData is in it's own function that isn't intertwined with the actual parameter parsing for the simulation) so that I could get this through, work on PowerSimData to use the new flags, then just delete any relevant code in REISE.jl without having to worry about coordination.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to sequence the changes however you're most comfortable.

As of now, the only people who regularly run scenarios are @BainanXia and myself, and we're pretty familiar with needing to merge/pull multiple repos at the same time to keep them playing nice together. It happens somewhat frequently with PowerSimData/PostREISE, and occasionally with PowerSimData/REISE.jl.

parser.add_argument(
"-s",
"--start-date",
help="The start date for the simulation in format 'YYYY-MM-DD HH:MM:SS'",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we are limited to only running simulations at hourly resolution, and only for the year 2016 (see min_ts and max_ts in launch_scenario()). Until we fix that issue, we should document it here.

:param str end_date: end date of simulation as 'YYYY-MM-DD HH:MM:SS'
:param int interval: length of each interval in hours
:param str input_dir: directory with input data
:param None/str output_dir: directory for output data. None defaults to input directory
Copy link
Contributor

@danielolsen danielolsen Sep 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accurate? I think None defaults to an output subfolder within the input directory (via the Julia part of REISE.jl).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I think you're right, and I missed it when trying to make sure my lines weren't too long. This should match the argument help message in line 164. Thank you for catching that!

@rouille
Copy link
Collaborator

rouille commented Sep 16, 2020

I believe we are not far from getting rid of the pyreisejl.utility.const module. The pyreisejl.utility.call module needs:

  • const.SCENARIO_LIST
  • const.EXECUTE_LIST
  • const.EXECUTE_DIR

All three are defined in powersimdata.utility.server_setup. In the powersimdata.scenario.execute, we can pass EXECUTE_DIR through the -i argument. Do you think we can create arguments for SCENARIO_LIST and EXECUTE_LIST?

pd.Timestamp("2016-01-01 00:00:00"),
pd.Timestamp("2016-01-01 02:00:00"),
"H",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete the dummy csv file at the end of the test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an in memory file, no? If so, nothing to clean up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was looking for a way to do it in memory to avoid having to clean it up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good then

"""
# extract time limits from 'demand.csv'
profile = open(os.path.join(input_dir, "demand.csv"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally good to call open with a context manager:

with open(os.path.join(input_dir, "demand.csv")) as profile:
    min_ts, max_ts, freq = extract_date_limits(profile)

dates = pd.date_range(start=min_ts, end=max_ts, freq=freq)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

:return: (*pandas.Timestamp*) -- the valid date as a pandas timestamp
"""
regex = r"^\d{4}-\d{1,2}-\d{1,2}( (?P<hour>\d{1,2})(:\d{1,2})?(:\d{1,2})?)?$"
match = re.match(regex, date)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always impressed by a good regex. That said, did we consider using either datetime.strptime or pandas.to_datetime (which supports the same format as strptime)? Nothing wrong with using regex, just throwing that out there in case it's useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex is to parse out whether or not an hour was specified, hence the use of the named capture group for that field specifically. This gives us the ability to specify an end date with no hour (e.g. 2016-01-01 to 2016-12-31), and then round it up to the end of the last day (2016-01-01 00:00:00 to 2016-12-31 23:00:00) which hopefully makes it a little more user friendly.

Comment on lines 112 to 113
print(start_ts)
print(end_ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these prints here on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Those are leftover from my personal testing.

Comment on lines 239 to 244
print(args.start_date)
print(args.end_date)
print(args.interval)
print(args.input_dir)
print(args.output_dir)
print(args.threads)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these prints still here on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! Those are leftover from my personal testing.

valid_date += pd.Timedelta(hours=23)

else:
err_str = f"'{date}' is an invalid date. It needs to be in the form YYYY-MM-DD."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also say that HH:MM:SS can be passed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure what's the best way to say that those are valid but optional. I didn't want an overly-wordy error string. Any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"'{date}' is an invalid date. YYYY-MM-DD is required, HH:MM:SS are optional."?

Copy link
Contributor

@danielolsen danielolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring this! I will follow your lead when I add a command line argument for the number of segments to linearize to.

@ahurli ahurli merged commit dc652ca into develop Sep 24, 2020
@ahurli ahurli deleted the anna/refactor branch October 2, 2020 16:36
danielolsen added a commit that referenced this pull request Oct 29, 2020
danielolsen added a commit that referenced this pull request Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants