Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable loading any scenario from blob storage #666

Merged
merged 7 commits into from
Aug 17, 2022
Merged

Enable loading any scenario from blob storage #666

merged 7 commits into from
Aug 17, 2022

Conversation

jenhagg
Copy link
Collaborator

@jenhagg jenhagg commented Aug 5, 2022

Purpose

Switch to the new esmi blob storage account which is populated with existing scenario data. This is represented as one of a few filesystems which are combined into a single MultiFS which acts as a "remote" data store.

What the code is doing

  • Point to the new storage account, keeping priority the same. We still use the server as the source of truth, but if a user isn't connected to the vpn when the multi filesystem is instantiated, we do not repeatedly attempt to connect (no different than before, just adding context)
  • Create smaller multi-filesystems depending on context. E.g. when we are using InputData, there is no need to get profiles, so there is no need to add the profiles container as a possible location to check. This is done by passing the make_fs parameter in context.py.
  • Add a read only shared access signature to server_setup.py. This is defined (as you can kind of tell by the query params) to allow read and list access on blobs and containers only (not queues, tables, etc). The reason for this is that it turns out anonymous directory listing is not supported for gen2 HNS accounts (as opposed to the original blob storage accounts). See the issue I submitted with azure for more context. The workaround suggested there does work, but it's just as safe to use a SAS instead and is more likely to be stable over time, as it's an officially supported use case.

Testing

Ran a scenario in docker container, after deleting existing volumes and containers. Also loaded a scenario that did not exist locally yet and checked that the files are cached locally.

Usage Example/Visuals

Loading a scenario from blob storage.

In [1]: from powersimdata import Scenario

In [2]: scl = Scenario().get_scenario_table()
/Users/jenhagg/.local/share/virtualenvs/PowerSimData-9jRUr7-X/lib/python3.9/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
Could not connect to ssh server
Initialized remote filesystem with scenario_fs
Transferring ScenarioList.csv.2 from scenario_fs

In [3]: scl.head().index
Out[3]: Int64Index([398, 400, 403, 404, 405], dtype='int64', name='id')

In [4]: s404 = Scenario(404)
Could not connect to ssh server
Initialized remote filesystem with scenario_fs
Transferring ScenarioList.csv.2 from scenario_fs
Transferring ExecuteList.csv.2 from scenario_fs
SCENARIO: test | EasternBase_2020_two_day_test

--> State
analyze
Could not connect to ssh server
Initialized remote filesystem with scenario_fs
--> Loading grid
Could not connect to ssh server
Initialized remote filesystem with profile_fs,scenario_fs
Transferring ScenarioList.csv.2 from scenario_fs
Loading bus
Loading plant
Loading heat_rate_curve
Loading gencost_before
Loading gencost_after
Loading branch
Loading dcline
Loading sub
Loading bus2sub
--> Loading ct

At this point, if I cd ScenarioData and ls -la data/output | grep 404 there are no results.

So we can load some outputs

In [5]: avg_cong = s404.get_averaged_cong()
Could not connect to ssh server
Initialized remote filesystem with scenario_fs
--> Loading AVERAGED_CONG
data/output/404_AVERAGED_CONG.pkl not found on local machine
Transferring 404_AVERAGED_CONG.pkl from scenario_fs

In [6]: dcline_pf = s404.get_dcline_pf()
Could not connect to ssh server
Initialized remote filesystem with scenario_fs
--> Loading PF_DCLINE
data/output/404_PF_DCLINE.pkl not found on local machine
Transferring 404_PF_DCLINE.pkl from scenario_fs

And confirm that the files have been downloaded

❯ ls data/output | grep 404
404_AVERAGED_CONG.pkl
404_PF_DCLINE.pkl
~/ScenarioData ❯                                                                                      

Note that this example would have worked essentially the same way before, but only for a handful of scenarios, instead of the full set of existing ones.

Time estimate

20 min - I don't think this breaks anything or is super risky

@jenhagg jenhagg self-assigned this Aug 5, 2022
Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. Thanks!

Copy link
Collaborator

@rouille rouille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it and it works well. I have, as you do, this annoying CryptographyDeprecationWarning. I will investigate

@jenhagg jenhagg merged commit 02dc955 into develop Aug 17, 2022
@jenhagg jenhagg deleted the jen/blob branch August 17, 2022 17:59
@jenhagg jenhagg mentioned this pull request Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants