Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI exceptions logging and more robust NetCDF closures #1084

Merged
merged 11 commits into from
Sep 10, 2018
Merged

Conversation

andrrizzi
Copy link
Contributor

I've made a few small modifications to fix the stack trace that is logged when an exception is raised in an MPI process and to make more robust the NetCDF handling. In particular,

  • I've implemented AlchemicalPhase.__del__ and make yank.experiments.Experiment force garbage collection after deleting the alchemical phase.
  • I've minimized the number of Dataset.open/close() in the multi-state samplers.
  • The reporter now attempts to open the NetCDF file 5 times before giving up and raising the error.

The switch_phase_interval is working much better for me after these modifications, which is hopeful.

@andrrizzi
Copy link
Contributor Author

This should be ready to be merged!

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just made two comments:

  • You can remove the masking operation at the #TODO
  • The changes in sams.py lead to clearer code---thanks!

# Open analysis file.
self._storage_analysis = self._open_dataset_robustly(self._storage_analysis_file_path,
mode, version=netcdf_format)
# TODO - AR: What's the purpose of set_auto_mask(False)? When we create the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We no longer need it since we're not using masking!

@@ -373,13 +373,14 @@ def _mix_replicas(self):
# TODO: We may be able to refactor this to simply have different update schemes compute neighborhoods differently.
# TODO: Can we allow "plugin" addition of new update schemes that can be registered externally?
with mmtools.utils.time_it('Mixing of replicas'):
jump_and_mix = self._JumpAndMixPacket(self.n_replicas, self.n_states)
# Initialize statistics. This matrix is modified by the jump function and used when updating the logZ estimates.
replicas_log_P_k = np.zeros([self.n_replicas, self.n_states], np.float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much clearer now---thanks!

for state_index in neighborhood:
u_k[state_index] = self._energy_thermodynamic_states[replica_index, state_index]
log_P_k[state_index] = self.log_weights[state_index] - u_k[state_index]
u_k = self._energy_thermodynamic_states[replica_index, state_index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the equations, it would be clearer to write:

u_k = self._energy_thermodynamic_states[replica_index, :]
log_P_k[state_index] = - u_k[state_index] + self.log_weights[state_index] 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Will change.

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@andrrizzi andrrizzi merged commit fea7b06 into master Sep 10, 2018
@andrrizzi andrrizzi deleted the mpi-netcdf branch September 10, 2018 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants