refactor: reduce memory requirements for mesh branch scaling #132
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Reduce memory requirements for using the
design_transmission
module, and as a side-benefit simplify the code.What is the code doing
Previously, we needed enough memory to store a congestion dataframe five times: we loaded CONGU, we loaded CONGL, we created numpy array versions of both, and we created a new dataframe of the same size to hold the element-wise maximization of these two dataframes. We did a 'clean-up' of CONGU and CONGL afterwards, but that doesn't help us if we don't have enough peak memory.
Instead, we can make use of the fact that element-wise, only one of CONGU and CONGL will be non-zero: we can simply add them instead of performing a type conversion, performing a numpy element-wise maximization, and then storing that result into a new dataframe. I think this new format only requires enough memory to store a congestion dataframe two or three times, depending on the internals of pandas DataFrame addition.
Perhaps most importantly, this function makes use of sparse dataframes as introduced in Breakthrough-Energy/PostREISE#96: we no longer expand the sparse dataframes into non-sparse numpy arrays. Even when starting with sparse dataframes, trying to run the previous code created a MemoryError on our laptops, while the new code runs without a problem.
Time Estimate
Half an hour or less.