Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPO pytorch MNIST example #1577

Closed
igabr opened this issue Sep 28, 2020 · 4 comments
Closed

HPO pytorch MNIST example #1577

igabr opened this issue Sep 28, 2020 · 4 comments

Comments

@igabr
Copy link

igabr commented Sep 28, 2020

The following notebook no longer works when run as is: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/pytorch_mnist/hpo_pytorch_mnist.ipynb

Failure reason
AlgorithmError: ExecuteUserScriptError: Command "/opt/conda/bin/python mnist.py --backend gloo --batch-size 512 --epochs 6 --lr 0.018272722904211385" Traceback (most recent call last): File "mnist.py", line 197, in train(parser.parse_args()) File "mnist.py", line 93, in train train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs) File "mnist.py", line 45, in _get_train_data_loader transforms.Normalize((0.1307,), (0.3081,)) File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 80, in init self.data, self.targets = torch.load(os.path.join(self.processed_folder, data_file)) File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 527, in load with _open_zipfile_reader(f) as opened_zipfile: File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 224, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError:

@aaronmarkham
Copy link
Contributor

I think some of this error is missing... do you have more you can paste in here?

@igabr
Copy link
Author

igabr commented Oct 5, 2020

`2020-09-28T11:02:21.532-04:00

2020-09-28 15:02:21,377 sagemaker-containers ERROR ExecuteUserScriptError:
2020-09-28 15:02:21,377 sagemaker-containers ERROR ExecuteUserScriptError:

2020-09-28T11:02:21.532-04:00

Command "/opt/conda/bin/python mnist.py --backend gloo --batch-size 512 --epochs 6 --lr 0.018272722904211385"
Command "/opt/conda/bin/python mnist.py --backend gloo --batch-size 512 --epochs 6 --lr 0.018272722904211385"

2020-09-28T11:02:21.532-04:00

Traceback (most recent call last):
File "mnist.py", line 197, in
train(parser.parse_args())
File "mnist.py", line 93, in train
train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs)
File "mnist.py", line 45, in _get_train_data_loader
transforms.Normalize((0.1307,), (0.3081,))
File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 80, in init
self.data, self.targets = torch.load(os.path.join(self.processed_folder, data_file))
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 527, in load
with _open_zipfile_reader(f) as opened_zipfile:
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 224, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
Traceback (most recent call last): File "mnist.py", line 197, in train(parser.parse_args()) File "mnist.py", line 93, in train train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs) File "mnist.py", line 45, in _get_train_data_loader transforms.Normalize((0.1307,), (0.3081,)) File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 80, in init self.data, self.targets = torch.load(os.path.join(self.processed_folder, data_file)) File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 527, in load with _open_zipfile_reader(f) as opened_zipfile: File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 224, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))

2020-09-28T11:02:21.533-04:00

RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132)
RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132)

2020-09-28T11:02:21.533-04:00

frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f84b181cb2a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f84b181cb2a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)

2020-09-28T11:02:21.533-04:00

frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x2065 (0x7f84b377ddc5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x2065 (0x7f84b377ddc5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)

2020-09-28T11:02:21.533-04:00

frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x77 (0x7f84b377e9d7 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x77 (0x7f84b377e9d7 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)

2020-09-28T11:02:21.533-04:00

frame #3: + 0x66c66c (0x7f84b7b9966c in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #3: + 0x66c66c (0x7f84b7b9966c in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

2020-09-28T11:02:21.533-04:00

frame #4: + 0x26caa1 (0x7f84b7799aa1 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: + 0x26caa1 (0x7f84b7799aa1 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

2020-09-28T11:02:21.533-04:00

frame #5: _PyCFunction_FastCallDict + 0x154 (0x5565dc54a534 in /opt/conda/bin/python)
frame #5: _PyCFunction_FastCallDict + 0x154 (0x5565dc54a534 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #6: _PyObject_FastCallDict + 0x2bf (0x5565dc54a94f in /opt/conda/bin/python)
frame #6: _PyObject_FastCallDict + 0x2bf (0x5565dc54a94f in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #7: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)
frame #7: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #8: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)
frame #8: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #9: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)
frame #9: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #10: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)
frame #10: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #11: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)
frame #11: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #12: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)
frame #12: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #13: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #14: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python)
frame #14: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #15: _PyFunction_FastCallDict + 0x1bc (0x5565dc5cc37c in /opt/conda/bin/python)
frame #15: _PyFunction_FastCallDict + 0x1bc (0x5565dc5cc37c in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #16: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python)
frame #16: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #17: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)
frame #17: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #18: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)
frame #18: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #19: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)
frame #19: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #20: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)
frame #20: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #21: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)
frame #21: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #22: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)
frame #22: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #23: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #24: + 0x192ff4 (0x5565dc5caff4 in /opt/conda/bin/python)
frame #24: + 0x192ff4 (0x5565dc5caff4 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #25: + 0x193ea1 (0x5565dc5cbea1 in /opt/conda/bin/python)
frame #25: + 0x193ea1 (0x5565dc5cbea1 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #26: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python)
frame #26: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #27: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #28: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python)
frame #28: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #29: _PyFunction_FastCallDict + 0x3da (0x5565dc5cc59a in /opt/conda/bin/python)
frame #29: _PyFunction_FastCallDict + 0x3da (0x5565dc5cc59a in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #30: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python)
frame #30: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #31: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)
frame #31: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #32: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)
frame #32: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)

2020-09-28T11:02:21.533-04:00

frame #33: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)
frame #33: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #34: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)
frame #34: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #35: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)
frame #35: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #36: _PyObject_FastCallKeywords + 0xaa (0x5565dc5cc18a in /opt/conda/bin/python)
frame #36: _PyObject_FastCallKeywords + 0xaa (0x5565dc5cc18a in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #37: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)
frame #37: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #38: _PyEval_EvalFrameDefault + 0x10ca (0x5565dc5f4f5a in /opt/conda/bin/python)
frame #38: _PyEval_EvalFrameDefault + 0x10ca (0x5565dc5f4f5a in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #39: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python)
frame #39: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #40: + 0x1958d6 (0x5565dc5cd8d6 in /opt/conda/bin/python)
frame #40: + 0x1958d6 (0x5565dc5cd8d6 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #41: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)
frame #41: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #42: _PyEval_EvalFrameDefault + 0x196d (0x5565dc5f57fd in /opt/conda/bin/python)
frame #42: _PyEval_EvalFrameDefault + 0x196d (0x5565dc5f57fd in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #43: + 0x193c6b (0x5565dc5cbc6b in /opt/conda/bin/python)
frame #43: + 0x193c6b (0x5565dc5cbc6b in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #44: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python)
frame #44: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #45: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)
frame #45: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #46: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python)
frame #46: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #47: PyEval_EvalCode + 0x1c (0x5565dc5cd74c in /opt/conda/bin/python)
frame #47: PyEval_EvalCode + 0x1c (0x5565dc5cd74c in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #48: + 0x215634 (0x5565dc64d634 in /opt/conda/bin/python)
frame #48: + 0x215634 (0x5565dc64d634 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #49: PyRun_FileExFlags + 0xa1 (0x5565dc64da31 in /opt/conda/bin/python)
frame #49: PyRun_FileExFlags + 0xa1 (0x5565dc64da31 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #50: PyRun_SimpleFileExFlags + 0x1c3 (0x5565dc64dc33 in /opt/conda/bin/python)
frame #50: PyRun_SimpleFileExFlags + 0x1c3 (0x5565dc64dc33 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #51: Py_Main + 0x613 (0x5565dc651723 in /opt/conda/bin/python)
frame #51: Py_Main + 0x613 (0x5565dc651723 in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #52: main + 0xee (0x5565dc51c1fe in /opt/conda/bin/python)
frame #52: main + 0xee (0x5565dc51c1fe in /opt/conda/bin/python)

2020-09-28T11:02:21.534-04:00

frame #53: __libc_start_main + 0xf0 (0x7f84c3a51830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #53: __libc_start_main + 0xf0 (0x7f84c3a51830 in /lib/x86_64-linux-gnu/libc.so.6)

2020-09-28T11:02:21.534-04:00

frame #54: + 0x1c2c2a (0x5565dc5fac2a in /opt/conda/bin/python)
frame #54: + 0x1c2c2a (0x5565dc5fac2a in /opt/conda/bin/python)
`

@hongshanli23
Copy link
Contributor

Hi @igabr, is this issue still there? It looks like an torchvision problem. I would suggest trying a different pytorch kernel.

@hongshanli23
Copy link
Contributor

close the issue due to lack of activity. please reopen if the issue persists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants