feat, refactor: added new entry point to run call.py without PowerSimData #74

ahurli · 2020-09-14T19:59:34Z

Purpose

Remove dependency of call.py on PowerSimData, adding another entry point to call.py that is not a scenario ID.

What is the code doing?

This code now runs independently of PowerSim data by passing in flags to define the required and options parameters to run a simulation. The flags are documented below, which is the result of running python call.py -h or python call.py --help.

To maintain backwards compatibility (for now), I have been able to successfully run the following two commands in my local environment and get the Julia engine to run locally (though not complete because I'm impatient) after copying scenario 1220 data to my local and updating const.py accordingly:
python -u ./pyreisejl/utility/call.py 1220
python -u ./pyreisejl/utility/call.py 1220 5

As an example of how the code can now be run:
python ./pyreisejl/utility/call.py -s '2016-01-01 00:00:00' -e '2016-01-07 23:00:00' -int 24 -i /Users/annahurlimann/ScenarioData/test

I've also included some additional checks/input validation:

Enforcing that the interval passed in (-i, --interval) needs to be an integer
A printed warning if the interval given doesn't evenly divide the start and end time given, which will presumably result in the exclusion of data at the end of given date range (WARNING: The interval you chose does not evenly divide your date range.)
Requiring start_date, end_date, interval, input_dir if not using the PowerSimData notation (i.e. python -u ./pyreisejl/utility/call.py -s '2016-01-01 00:00:00' returns an WrongNumberOfArguments error).

usage: call.py [-h] [-s START_DATE] [-e END_DATE] [-int INTERVAL] [-i INPUT_DIR] [-o OUTPUT_DIR] [-t THREADS]
               [scenario_id] [powersim_threads]

Run REISE.jl simulation.

positional arguments:
  scenario_id           Scenario ID only if using PowerSimData.
  powersim_threads      Number of threads only if using PowerSimData.

optional arguments:
  -h, --help            show this help message and exit
  -s START_DATE, --start-date START_DATE
                        The start date for the simulation in format 'YYYY-MM-DD HH:MM:SS'
  -e END_DATE, --end-date END_DATE
                        The end date for the simulation in format 'YYYY-MM-DD HH:MM:SS'
  -int INTERVAL, --interval INTERVAL
                        The length of each interval in hours.
  -i INPUT_DIR, --input-dir INPUT_DIR
                        The directory containing the input data files. Required files are 'case.mat', 'demand.csv',
                        'hydro.csv', 'solar.csv', and 'wind.csv'.
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        The directory to store the results. This is optional and defaults to an output folder that will be
                        created in the input directory if it does not exist.
  -t THREADS, --threads THREADS
                        The number of threads to run the simulation with. This is optional and defaults to Auto.

Where to Look

The code changed is all in pyreise.utility.call.

Time to Review

15 min.

pyreisejl/utility/call.py

jenhagg · 2020-09-14T21:50:52Z

pyreisejl/utility/call.py

+    )
+
+
+def _insert_in_file(filename, scenario_id, column_number, column_value):


We could update the docstring here, since it now also works on the scenario list. I suspect we'd move this to a helper module when refactoring extract_data since there is a duplicate of this in there, but it can wait, just an fyi if you didn't see that yet.

Sounds good! I didn't really make any changes here, so I kept the docstring as is, but I see no reason why not to update it.

As far as the refactoring of insert_in_file out, I still think ultimately we should remove the PowerSim dependency from here. Since a multi-repo PR seemed a little daunting while I'm still learning the workflow of the team, I wrote this so decoupling it would ultimately just mean deleting _get_scenario, _record_scenario, _insert_in_file, the arg definitions, and the if args.scenario_id statements (i.e. just deleting code but not necessarily writing anything new). My hope is to refactor extract_data, update PowerSimData to use this new call, then just delete these functions embedded here and in extract_data.

pyreisejl/utility/call.py

jenhagg · 2020-09-14T22:21:43Z

pyreisejl/utility/call.py

+    )
+
+    parser.add_argument(
+        "powersim_threads",


Is it possible to combine the thread arguments and still have compatibility?

Yes, but it causes some weird interactions which I'm not sure are desirable. Specifically, you can pass in 2 arguments defining the number of threads, and the second one will be applied.

E.g.
python -u ./pyreisejl/utility/call.py 88 12 -t 123 will set threads to 123 while python -u ./pyreisejl/utility/call.py -t 123 88 12 sets the threads arg to 12.

If it's too ugly to support both, adding a complementary PR in PowerSimData to specify threads with a -t flag should be very easy, and as long as we merge both PRs at the same time it won't disrupt our workflow at all.

Yeah I was thinking it would be nice to only have one flag, but wasn't sure if we'd need to change powersimdata to achieve that. I'm fine with merging this now and doing the complementary PRs as a follow up, or adding onto this, whatever is easier.

Ha, yeah I was trying to avoid a multi-repo PR first thing. (See above comment re: 'multi-repo PR'). Basically, I wrote this such that my plan was to maintain backwards compatibility with code that was easily deletable (i.e. anything referencing PowerSimData is in it's own function that isn't intertwined with the actual parameter parsing for the simulation) so that I could get this through, work on PowerSimData to use the new flags, then just delete any relevant code in REISE.jl without having to worry about coordination.

Feel free to sequence the changes however you're most comfortable.

As of now, the only people who regularly run scenarios are @BainanXia and myself, and we're pretty familiar with needing to merge/pull multiple repos at the same time to keep them playing nice together. It happens somewhat frequently with PowerSimData/PostREISE, and occasionally with PowerSimData/REISE.jl.

pyreisejl/utility/call.py

danielolsen · 2020-09-14T23:17:38Z

pyreisejl/utility/call.py

+    parser.add_argument(
+        "-s",
+        "--start-date",
+        help="The start date for the simulation in format 'YYYY-MM-DD HH:MM:SS'",


Right now we are limited to only running simulations at hourly resolution, and only for the year 2016 (see min_ts and max_ts in launch_scenario()). Until we fix that issue, we should document it here.

danielolsen · 2020-09-14T23:28:51Z

pyreisejl/utility/call.py

+    :param str end_date: end date of simulation as 'YYYY-MM-DD HH:MM:SS'
+    :param int interval: length of each interval in hours
+    :param str input_dir: directory with input data
+    :param None/str output_dir: directory for output data. None defaults to input directory


Is this accurate? I think None defaults to an output subfolder within the input directory (via the Julia part of REISE.jl).

Nope! I think you're right, and I missed it when trying to make sure my lines weren't too long. This should match the argument help message in line 164. Thank you for catching that!

pyreisejl/utility/call.py

rouille · 2020-09-16T03:46:26Z

I believe we are not far from getting rid of the pyreisejl.utility.const module. The pyreisejl.utility.call module needs:

const.SCENARIO_LIST
const.EXECUTE_LIST
const.EXECUTE_DIR

All three are defined in powersimdata.utility.server_setup. In the powersimdata.scenario.execute, we can pass EXECUTE_DIR through the -i argument. Do you think we can create arguments for SCENARIO_LIST and EXECUTE_LIST?

pyreisejl/utility/call.py

pyreisejl/utility/helpers.py

rouille · 2020-09-23T20:58:46Z

pyreisejl/utility/tests/test_helpers.py

+        pd.Timestamp("2016-01-01 00:00:00"),
+        pd.Timestamp("2016-01-01 02:00:00"),
+        "H",
+    )


Should we delete the dummy csv file at the end of the test?

This is an in memory file, no? If so, nothing to clean up.

Yeah, I was looking for a way to do it in memory to avoid having to clean it up.

All good then

jenhagg · 2020-09-23T21:50:28Z

pyreisejl/utility/call.py

    """
+    # extract time limits from 'demand.csv'
+    profile = open(os.path.join(input_dir, "demand.csv"))


It's generally good to call open with a context manager:

with open(os.path.join(input_dir, "demand.csv")) as profile: min_ts, max_ts, freq = extract_date_limits(profile) dates = pd.date_range(start=min_ts, end=max_ts, freq=freq)

jenhagg · 2020-09-23T22:16:24Z

pyreisejl/utility/helpers.py

+    :return: (*pandas.Timestamp*) -- the valid date as a pandas timestamp
+    """
+    regex = r"^\d{4}-\d{1,2}-\d{1,2}( (?P<hour>\d{1,2})(:\d{1,2})?(:\d{1,2})?)?$"
+    match = re.match(regex, date)


Always impressed by a good regex. That said, did we consider using either datetime.strptime or pandas.to_datetime (which supports the same format as strptime)? Nothing wrong with using regex, just throwing that out there in case it's useful.

The regex is to parse out whether or not an hour was specified, hence the use of the named capture group for that field specifically. This gives us the ability to specify an end date with no hour (e.g. 2016-01-01 to 2016-12-31), and then round it up to the end of the last day (2016-01-01 00:00:00 to 2016-12-31 23:00:00) which hopefully makes it a little more user friendly.

danielolsen · 2020-09-23T23:28:55Z

pyreisejl/utility/call.py

+    print(start_ts)
+    print(end_ts)


Are these prints here on purpose?

Nope! Those are leftover from my personal testing.

danielolsen · 2020-09-23T23:30:24Z

pyreisejl/utility/call.py

+    print(args.start_date)
+    print(args.end_date)
+    print(args.interval)
+    print(args.input_dir)
+    print(args.output_dir)
+    print(args.threads)


Are these prints still here on purpose?

Nope! Those are leftover from my personal testing.

danielolsen · 2020-09-23T23:32:01Z

pyreisejl/utility/helpers.py

+            valid_date += pd.Timedelta(hours=23)
+
+    else:
+        err_str = f"'{date}' is an invalid date. It needs to be in the form YYYY-MM-DD."


Should we also say that HH:MM:SS can be passed?

Yeah, I'm not sure what's the best way to say that those are valid but optional. I didn't want an overly-wordy error string. Any suggestions?

f"'{date}' is an invalid date. YYYY-MM-DD is required, HH:MM:SS are optional."?

danielolsen

Thanks for refactoring this! I will follow your lead when I add a command line argument for the number of segments to linearize to.

…Data

ahurli requested review from rouille and jenhagg September 14, 2020 19:59

jenhagg reviewed Sep 14, 2020

View reviewed changes

pyreisejl/utility/call.py Show resolved Hide resolved

jenhagg reviewed Sep 14, 2020

View reviewed changes