MPI exceptions logging and more robust NetCDF closures #1084

andrrizzi · 2018-09-10T02:44:58Z

I've made a few small modifications to fix the stack trace that is logged when an exception is raised in an MPI process and to make more robust the NetCDF handling. In particular,

I've implemented AlchemicalPhase.__del__ and make yank.experiments.Experiment force garbage collection after deleting the alchemical phase.
I've minimized the number of Dataset.open/close() in the multi-state samplers.
The reporter now attempts to open the NetCDF file 5 times before giving up and raising the error.

The switch_phase_interval is working much better for me after these modifications, which is hopeful.

…s closed.

andrrizzi · 2018-09-10T13:56:24Z

This should be ready to be merged!

jchodera

I've just made two comments:

You can remove the masking operation at the #TODO
The changes in sams.py lead to clearer code---thanks!

jchodera · 2018-09-10T14:09:55Z

Yank/multistate/multistatereporter.py

+        # Open analysis file.
+        self._storage_analysis = self._open_dataset_robustly(self._storage_analysis_file_path,
+                                                             mode, version=netcdf_format)
+        # TODO - AR: What's the purpose of set_auto_mask(False)? When we create the


We no longer need it since we're not using masking!

jchodera · 2018-09-10T14:11:51Z

Yank/multistate/sams.py

@@ -373,13 +373,14 @@ def _mix_replicas(self):
        # TODO: We may be able to refactor this to simply have different update schemes compute neighborhoods differently.
        # TODO: Can we allow "plugin" addition of new update schemes that can be registered externally?
        with mmtools.utils.time_it('Mixing of replicas'):
-            jump_and_mix = self._JumpAndMixPacket(self.n_replicas, self.n_states)
+            # Initialize statistics. This matrix is modified by the jump function and used when updating the logZ estimates.
+            replicas_log_P_k = np.zeros([self.n_replicas, self.n_states], np.float64)


This is much clearer now---thanks!

jchodera · 2018-09-10T14:13:31Z

Yank/multistate/sams.py

            for state_index in neighborhood:
-                u_k[state_index] = self._energy_thermodynamic_states[replica_index, state_index]
-                log_P_k[state_index] = self.log_weights[state_index] - u_k[state_index]
+                u_k = self._energy_thermodynamic_states[replica_index, state_index]


To match the equations, it would be clearer to write:

u_k = self._energy_thermodynamic_states[replica_index, :] log_P_k[state_index] = - u_k[state_index] + self.log_weights[state_index]

Good idea! Will change.

jchodera

Great!

andrrizzi added 8 commits September 9, 2018 22:24

Correctly log the stack trace of an MPI process exception.

4486a72

Make sure the Reporter is garbaged collected and the netcdf dataset i…

73491dc

…s closed.

Attempt opening the NetCDF file 5 times before giving up.

8e5009f

Less frequent netcdf open/close and encapsulate in try...finally.

4592012

Improved debug messages

a020a0f

A couple of simplifications

712e595

Update whatsnew.rst

90b8e8a

Fix case where checkpoint file can't be found.

860b666

jchodera approved these changes Sep 10, 2018

View reviewed changes

andrrizzi added 2 commits September 10, 2018 10:45

Remove dataset.set_automask(False)

9a11bdf

Better correspondence to paper equations

868526e

jchodera approved these changes Sep 10, 2018

View reviewed changes

Re-add set_auto_mask(False)

c7deda2

andrrizzi merged commit fea7b06 into master Sep 10, 2018

andrrizzi deleted the mpi-netcdf branch September 10, 2018 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPI exceptions logging and more robust NetCDF closures #1084

MPI exceptions logging and more robust NetCDF closures #1084

andrrizzi commented Sep 10, 2018

andrrizzi commented Sep 10, 2018

jchodera left a comment

jchodera Sep 10, 2018

jchodera Sep 10, 2018

jchodera Sep 10, 2018

andrrizzi Sep 10, 2018

jchodera left a comment

MPI exceptions logging and more robust NetCDF closures #1084

MPI exceptions logging and more robust NetCDF closures #1084

Conversation

andrrizzi commented Sep 10, 2018

andrrizzi commented Sep 10, 2018

jchodera left a comment

Choose a reason for hiding this comment

jchodera Sep 10, 2018

Choose a reason for hiding this comment

jchodera Sep 10, 2018

Choose a reason for hiding this comment

jchodera Sep 10, 2018

Choose a reason for hiding this comment

andrrizzi Sep 10, 2018

Choose a reason for hiding this comment

jchodera left a comment

Choose a reason for hiding this comment