Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use base profile directly #217

Merged
merged 6 commits into from
Jul 2, 2020
Merged

Use base profile directly #217

merged 6 commits into from
Jul 2, 2020

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Jun 26, 2020

Purpose

Manipulate base profiles and not symbolic links to these profiles. This refactoring does the following:

  • No symbolic links are created during the building phase of the scenario. This means that once the scenario is created, only the change table will be in /home/EGM/v2/data/input/
  • When preparing the scenario input data, the base profile will be downloaded on the local machine if not already present. Thien the change table is applied to the base profile and the profile is uploaded to the tmp folder of the scenario.

The benefits are:

  • Profiles will be downloaded only once. Unless the user manually deletes the base profiles in his local folder, they will be available to prepare the simulation inputs of a new scenario or to analyze a scenario.
  • there are no more duplicated data in the local folder.
  • profiles will be retrievable once moved to a backup disk because there are no more symbolic links involved.

Note that the existing profiles in your local folder named id_demand.csv, id_hydro.csv, id_solar.csv and id_wind.csv will no longer be used and can be deleted.

What is the code doing?

There a lot of files that has been refactored. The major change happens in the InputData class where we don't download the link but the raw profiles.

Where to look

Start with the powersimdata/input/profiles.py file. We remove the symlingk generation In powersimdata/scenario/create.py file. The other modified files accomodate the changes in powersimdata/input/profiles.py.

Time estimate

30 min

Test

I have succressfully created, prepared, ran and extracted scenario 876. You can see that there are no more links in /home/EGM/v2/data/input/ for this scenario.

@rouille rouille added new feature Feature that is currently in progress. refactor Code that is being refactored labels Jun 26, 2020
@rouille rouille added this to the Black Lives Matter milestone Jun 26, 2020
@danielolsen
Copy link
Contributor

danielolsen commented Jun 30, 2020

I am testing this with Scenarios 923 and 924 on Heracles and Scenarios 925 and 926 on Zeus.

@@ -7,8 +7,6 @@
from powersimdata.input.transform_grid import TransformGrid
from powersimdata.input.transform_profile import TransformProfile

scenario_id = "564"
Copy link
Contributor

@danielolsen danielolsen Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like this line, and def scenario(): return Scenario(scenario_id), just so that all hardcoded values are explicitly declared up top, not within functions. Is there a reason to remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 102 to 106
program = "'{for(i=1; i<=NF; i++){if($1==%s) $2=\"%s\"}};1'" % (
self._scenario_info["id"],
status,
)
program = "'{if($1==%s) $2=\"%s\"};1'" % (self._scenario_info["id"], status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not very familiar with awk, what is the impact of the change here? Can we add a comment that makes this section a bit more clear?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment. What I was doing before i.e looping over the columns (NF is the number of fields) was useless. I guess at that time I was a awk novice. You just want to locate the id through the the if and do something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

self.grid = grid
self.ct = ct

self._tmp_dir = "%s/scenario_%s" % (const.EXECUTE_DIR, scenario_id)
self.TMP_DIR = "%s/scenario_%s" % (const.EXECUTE_DIR, scenario_info["id"])
Copy link
Contributor

@danielolsen danielolsen Jun 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any significance to this name change? Is it to match the variable naming convention for the other directories in const?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I need to access other info of the scenario, the interconnect and the base profiles version. So I can no longer just pass the id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the change from self._tmp_dir to self.TMP_DIR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sorry for the misunderstanding. I decided to follow the synthax for folder location that we have in the powersimdata.utility.const module

@danielolsen
Copy link
Contributor

The changes are indeed scattered throughout the code, but mostly are changes to use the scenario info from the ScenarioList rather than just the ID, which makes sense. I don't see anything that concerns me, and the Eastern scenarios I'm running on Heracles should finish by tomorrow afternoon so we will get some independent verification that this works in our most complicated scenario-running process so far.

@BainanXia
Copy link
Collaborator

I think changing from scenario id to scenario info is reasonable given we will need other information of the scenario we are dealing with as well from now on. I've gone through the code and it looks reasonable to me. I think it is good to go if the ongoing integration tests pass.

@danielolsen danielolsen self-requested a review July 2, 2020 02:32
Copy link
Contributor

@danielolsen danielolsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested successfully on scenarios 925 and 926, which were run start-to-finish with the new code. Other scenarios run with the old code and analyzed with the new code also worked successfully, and much faster. Thanks for the performance improvement!

Copy link
Collaborator

@BainanXia BainanXia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thanks @rouille

@rouille rouille merged commit 56398a6 into develop Jul 2, 2020
@rouille rouille deleted the remove_links branch July 2, 2020 05:02
This was referenced Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Feature that is currently in progress. refactor Code that is being refactored
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants