Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage scenario/execute list using pandas #416

Merged
merged 16 commits into from
Mar 18, 2021
Merged

Manage scenario/execute list using pandas #416

merged 16 commits into from
Mar 18, 2021

Conversation

jenhagg
Copy link
Collaborator

@jenhagg jenhagg commented Mar 13, 2021

Purpose

Supports local usage on windows by limiting the assumption of a unix environment to the SSHDataAccess class (in the context of scenario/execute list only; there is more work to do for full windows support)

What the code is doing

Implements the following pattern for scenario/execute list operations. For local environments: just modify and query using pandas. For the client/server environment:

  • calculate sha1sum prior to download and save in a local variable
  • download the file and modify as needed
  • upload the file
  • using flock to synchronize access, check if the current sha1sum of the file is the same as before, and if so, write the changes and create a backup (the -b option on mv command).

The client/server operation basically extends the local variant by adding the integrity check, so we get some code reuse out of it. Other changes are mostly supporting functionality, with a few minor improvements thrown in.

Testing

  • Added unit tests (originally in plug, but realized they could go here). These test against an empty csv with headers, copied from the powersimdata/utility/templates for each test.
  • Tested in the standalone container setup by running a simulation from scratch
  • Tested in the client/server container setup by creating and preparing a scenario (it's not set up to run one yet), as well as setting a breakpoint during the modification and editing the file in the server container to hit the error state

Usage Example/Visuals

I could add some usage output if it's useful

Time estimate

30 min

@jenhagg jenhagg self-assigned this Mar 13, 2021
@jenhagg jenhagg added this to the Put Your Records On milestone Mar 13, 2021
@jenhagg jenhagg linked an issue Mar 13, 2021 that may be closed by this pull request
@jenhagg jenhagg added the refactor Code that is being refactored label Mar 13, 2021
_ = self._execute_and_check_err(command, err_message)

def delete_entry(self, scenario_info):
table = self.get_scenario_table()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like it should be easier to do in pandas, but it's the only way I got it to work. Let me know if you have suggestions.

@rouille
Copy link
Collaborator

rouille commented Mar 13, 2021

We probably want to include a test for the Delete class (see README for an example) in the standalone container setup.

@danielolsen
Copy link
Contributor

I'll run a test on the server this afternoon.

@danielolsen
Copy link
Contributor

I get the following traceback trying to call Create.create_scenario():

>>> scenario.state.create_scenario()
CREATING SCENARIO: test | pandas

Transferring ScenarioList.csv from server
100%|#######################################| 542k/542k [00:00<00:00, 3.65Mb/s]
Transferring ScenarioList.csv from server
100%|#######################################| 542k/542k [00:00<00:00, 4.24Mb/s]
--> Adding entry in ScenarioList.csv on server
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\scenario\create.py", line 205, in create_scenario
    self._scenario_list_manager.add_entry(self._scenario_info)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\data_access\csv_store.py", line 20, in wrapper
    self.commit(table, checksum)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\data_access\csv_store.py", line 76, in commit
    self.data_access.push(self._FILE_NAME, checksum)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\data_access\data_access.py", line 349, in push
    self.move_to(file_name, change_name_to=backup, preserve=True)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\data_access\data_access.py", line 297, in move_to
    self._check_file_exists(to_path, should_exist=False)
  File "C:\Users\DanielOlsen\repos\bes\PowerSimData\powersimdata\data_access\data_access.py", line 68, in _check_file_exists
    raise OSError(f"{filepath} {msg} on server")
OSError: /mnt/bes/pcm/ScenarioList.csv.bak already exists on server

@jenhagg
Copy link
Collaborator Author

jenhagg commented Mar 17, 2021

I think if we remove the file it should work - the approach here is to upload the file as FILE.csv.bak and if the checksum matches, rename it to FILE.csv and create FILE.csv~ as the backup. So the .bak file should only be temporary, unless there is a conflict which has to be resolved manually.

@danielolsen
Copy link
Contributor

After @jon-hagg's tweaks, I've successfully created, launched, extracted, and deleted a scenario on the server.

Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@jenhagg jenhagg merged commit e0b308c into develop Mar 18, 2021
@jenhagg jenhagg deleted the jon/pandas branch March 18, 2021 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code that is being refactored
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use pandas to update ScenarioList and ExecuteList
3 participants