Fix a load_params bug when loading a CUDA model to CPU. #358

benjamin-work · 2018-10-09T11:38:48Z

This bug occurred when a model was trained on GPU and saved using save_params, then loaded, using load_params, with device='CPU' on a machine without CUDA device.

Fixes #354.
Should supersede #356.

This bug occurred when a model was trained on GPU and saved using save_params, then loaded, using load_params, with device='CPU' on a machine without CUDA device.

thomasjpfan

The usage of early exiting makes the code more understandable. There seems to be no avoiding adding skorch/tests/net_cuda.pt to test loading a cuda model into a cpu.

thomasjpfan · 2018-10-15T18:08:56Z

skorch/net.py

-        else:
+        # use CPU
+        if not self.device.startswith('cuda'):
+            model = torch.load(f, map_location=lambda storage, loc: storage)


Pytorch supports map_location=torch.device('cpu'), which may make the intend of loading into a cpu clearer.

Yes, it makes sense to be more explicit.

thomasjpfan · 2018-10-15T18:09:03Z

skorch/net.py

+            "available. Loading on CPU instead.",
+            ResourceWarning)
+        self.device = 'cpu'
+        model = torch.load(f, lambda storage, loc: storage)


Pytorch supports map_location=torch.device('cpu'), which may make the intend of loading into a cpu clearer.

thomasjpfan · 2018-10-15T18:16:43Z

skorch/tests/test_net.py

+    def test_load_cuda_params_to_cpu(self, net_cls, module_cls, data):
+        # Note: This test will pass trivially when CUDA is available
+        # but triggered a bug when CUDA is not available.
+        net = net_cls(module_cls).initialize()


I am unable to see where the bug is in this test. Can you expand on this point?

With the current implementation, when you want to load a model trained with CUDA on CPU, you get:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location='cpu' to map your storages to the CPU.

This is the bug from the initial issue. The reason is that currently, we only explicitly load to CPU when the user indicates they want to use CUDA but no CUDA device is detected. The more trivial case that the user indicates that they want to load to CPU is not covered.

benjamin-work · 2018-10-16T08:14:34Z

I will wait for #360 before merging this, it's probably easier that way than the other way round.

thomasjpfan

Minor suggestion to avoid a DeprecationWarning. Otherwise LGTM

thomasjpfan · 2018-10-22T22:50:40Z

skorch/tests/test_net.py

+        X, y = data
+        net = net_cls(module_cls, device='cuda', max_epochs=1).fit(X, y)
+        p = tmpdir.mkdir('skorch').join('testmodel.pkl')
+        net.save_params(str(p))


Suggested change

net.save_params(str(p))

net.save_params(f_params=str(p))

thomasjpfan · 2018-10-22T22:51:09Z

skorch/tests/test_net.py

+    def test_load_cuda_params_to_cuda(self, net_cls, module_cls, data):
+        net = net_cls(module_cls, device='cuda').initialize()
+        # net_cuda.pt is a net trained on CUDA
+        net.load_params(os.path.join('skorch', 'tests', 'net_cuda.pt'))


Suggested change

net.load_params(os.path.join('skorch', 'tests', 'net_cuda.pt'))

net.load_params(f_params=os.path.join('skorch', 'tests', 'net_cuda.pt'))

thomasjpfan · 2018-10-22T22:52:22Z

skorch/tests/test_net.py

+        net.save_params(str(p))
+
+        net2 = net_cls(module_cls, device='cpu').initialize()
+        net2.load_params(str(p))


Suggested change

net2.load_params(str(p))

net2.load_params(f_params=str(p))

ottonemo

LGTM, I think the things I addressed should (if at all) be done in a new PR.

ottonemo · 2018-10-23T15:36:19Z

skorch/net.py

+            map_location = torch.device('cpu')
+
+        return torch.load(f, map_location=map_location)
+


Two things:

The warning should include that the self.device parameter is now set to 'cpu'.

I think we should refactor _get_state_dict into _get_map_location so it can be used in __setstate__ as well since the code there is doing basically the same thing.

Fix a load_params bug with loading a CUDA model to CPU.

85820be

This bug occurred when a model was trained on GPU and saved using save_params, then loaded, using load_params, with device='CPU' on a machine without CUDA device.

benjamin-work added the bug label Oct 9, 2018

benjamin-work self-assigned this Oct 9, 2018

benjamin-work requested a review from ottonemo October 9, 2018 11:38

benjamin-work mentioned this pull request Oct 10, 2018

Fixes problem attempt to load on GPU when no GPU is available (#354) #356

Closed

thomasjpfan mentioned this pull request Oct 11, 2018

[MRG] Adds optimizer and history to save/load_params #360

Merged

thomasjpfan reviewed Oct 15, 2018

View reviewed changes

Merge branch 'master' into bugfix/load-save-cuda

c31d89d

thomasjpfan reviewed Oct 22, 2018

View reviewed changes

benjamin-work added 2 commits October 23, 2018 12:55

Fix some deprecation warnings associated with f_params.

734b491

Add tests for loading cuda-trained optimizer.

e7bc359

ottonemo approved these changes Oct 23, 2018

View reviewed changes

ottonemo merged commit 3ae8120 into master Oct 23, 2018

ottonemo deleted the bugfix/load-save-cuda branch October 25, 2018 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a load_params bug when loading a CUDA model to CPU. #358

Fix a load_params bug when loading a CUDA model to CPU. #358

benjamin-work commented Oct 9, 2018

thomasjpfan left a comment

thomasjpfan Oct 15, 2018

benjamin-work Oct 16, 2018

thomasjpfan Oct 15, 2018

benjamin-work Oct 16, 2018

thomasjpfan Oct 15, 2018

benjamin-work Oct 16, 2018

benjamin-work commented Oct 16, 2018

thomasjpfan left a comment

thomasjpfan Oct 22, 2018

thomasjpfan Oct 22, 2018

thomasjpfan Oct 22, 2018

ottonemo left a comment

ottonemo Oct 23, 2018

	net.load_params(os.path.join('skorch', 'tests', 'net_cuda.pt'))
	net.load_params(f_params=os.path.join('skorch', 'tests', 'net_cuda.pt'))

		map_location = torch.device('cpu')

		return torch.load(f, map_location=map_location)

Fix a load_params bug when loading a CUDA model to CPU. #358

Fix a load_params bug when loading a CUDA model to CPU. #358

Conversation

benjamin-work commented Oct 9, 2018

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjamin-work commented Oct 16, 2018

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ottonemo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment