-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPO pytorch MNIST example #1577
Comments
I think some of this error is missing... do you have more you can paste in here? |
`2020-09-28T11:02:21.532-04:00 2020-09-28 15:02:21,377 sagemaker-containers ERROR ExecuteUserScriptError: 2020-09-28T11:02:21.532-04:00 Command "/opt/conda/bin/python mnist.py --backend gloo --batch-size 512 --epochs 6 --lr 0.018272722904211385" 2020-09-28T11:02:21.532-04:00 Traceback (most recent call last): 2020-09-28T11:02:21.533-04:00 RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /AWS-PyTorch/caffe2/serialize/inline_container.cc:132) 2020-09-28T11:02:21.533-04:00 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x6a (0x7f84b181cb2a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so) 2020-09-28T11:02:21.533-04:00 frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x2065 (0x7f84b377ddc5 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so) 2020-09-28T11:02:21.533-04:00 frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x77 (0x7f84b377e9d7 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so) 2020-09-28T11:02:21.533-04:00 frame #3: + 0x66c66c (0x7f84b7b9966c in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 2020-09-28T11:02:21.533-04:00 frame #4: + 0x26caa1 (0x7f84b7799aa1 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 2020-09-28T11:02:21.533-04:00 frame #5: _PyCFunction_FastCallDict + 0x154 (0x5565dc54a534 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #6: _PyObject_FastCallDict + 0x2bf (0x5565dc54a94f in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #7: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #8: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #9: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #10: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #11: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #12: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #13: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #14: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #15: _PyFunction_FastCallDict + 0x1bc (0x5565dc5cc37c in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #16: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #17: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #18: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #19: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #20: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #21: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #22: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #23: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #24: + 0x192ff4 (0x5565dc5caff4 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #25: + 0x193ea1 (0x5565dc5cbea1 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #26: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #27: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #28: + 0x193196 (0x5565dc5cb196 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #29: _PyFunction_FastCallDict + 0x3da (0x5565dc5cc59a in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #30: _PyObject_FastCallDict + 0x26f (0x5565dc54a8ff in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #31: _PyObject_Call_Prepend + 0x63 (0x5565dc54f363 in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #32: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python) 2020-09-28T11:02:21.533-04:00 frame #33: + 0x16a79b (0x5565dc5a279b in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #34: + 0x199de7 (0x5565dc5d1de7 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #35: _PyObject_FastCallDict + 0x8b (0x5565dc54a71b in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #36: _PyObject_FastCallKeywords + 0xaa (0x5565dc5cc18a in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #37: + 0x199bfe (0x5565dc5d1bfe in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #38: _PyEval_EvalFrameDefault + 0x10ca (0x5565dc5f4f5a in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #39: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #40: + 0x1958d6 (0x5565dc5cd8d6 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #41: PyObject_Call + 0x3e (0x5565dc54a33e in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #42: _PyEval_EvalFrameDefault + 0x196d (0x5565dc5f57fd in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #43: + 0x193c6b (0x5565dc5cbc6b in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #44: + 0x199b85 (0x5565dc5d1b85 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #45: _PyEval_EvalFrameDefault + 0x30a (0x5565dc5f419a in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #46: PyEval_EvalCodeEx + 0x329 (0x5565dc5cc9b9 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #47: PyEval_EvalCode + 0x1c (0x5565dc5cd74c in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #48: + 0x215634 (0x5565dc64d634 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #49: PyRun_FileExFlags + 0xa1 (0x5565dc64da31 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #50: PyRun_SimpleFileExFlags + 0x1c3 (0x5565dc64dc33 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #51: Py_Main + 0x613 (0x5565dc651723 in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #52: main + 0xee (0x5565dc51c1fe in /opt/conda/bin/python) 2020-09-28T11:02:21.534-04:00 frame #53: __libc_start_main + 0xf0 (0x7f84c3a51830 in /lib/x86_64-linux-gnu/libc.so.6) 2020-09-28T11:02:21.534-04:00 frame #54: + 0x1c2c2a (0x5565dc5fac2a in /opt/conda/bin/python) |
Hi @igabr, is this issue still there? It looks like an torchvision problem. I would suggest trying a different pytorch kernel. |
close the issue due to lack of activity. please reopen if the issue persists |
The following notebook no longer works when run as is: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/hyperparameter_tuning/pytorch_mnist/hpo_pytorch_mnist.ipynb
Failure reason
AlgorithmError: ExecuteUserScriptError: Command "/opt/conda/bin/python mnist.py --backend gloo --batch-size 512 --epochs 6 --lr 0.018272722904211385" Traceback (most recent call last): File "mnist.py", line 197, in train(parser.parse_args()) File "mnist.py", line 93, in train train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs) File "mnist.py", line 45, in _get_train_data_loader transforms.Normalize((0.1307,), (0.3081,)) File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 80, in init self.data, self.targets = torch.load(os.path.join(self.processed_folder, data_file)) File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 527, in load with _open_zipfile_reader(f) as opened_zipfile: File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 224, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError:
The text was updated successfully, but these errors were encountered: