You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thread 0x00007fddbdffb700 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/concurrent/futures/thread.py", line 75 in _worker
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 910 in run
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fddbe7fc700 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/concurrent/futures/thread.py", line 75 in _worker
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 910 in run
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 930 in _bootstrap
Current thread 0x00007fe8de6390c0 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1955 in shape
File "<array_function internals>", line 5 in shape
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1307 in
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1307 in _mapped_axis_size
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1633 in f_pmapped
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1725 in f_pmapped
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 162 in reraise_with_filtered_traceback
File "/xmc_gan/xmcgan/train_utils.py", line 424 in train
File "/xmc_gan/xmcgan/main.py", line 62 in main
File "/root/yes/envs/py39/lib/python3.9/site-packages/absl/app.py", line 251 in _run_main
File "/root/yes/envs/py39/lib/python3.9/site-packages/absl/app.py", line 303 in run
File "/xmc_gan/xmcgan/main.py", line 70 in
File "/root/yes/envs/py39/lib/python3.9/runpy.py", line 87 in _run_code
File "/root/yes/envs/py39/lib/python3.9/runpy.py", line 197 in _run_module_as_main
train.sh: line 24: 45523 Segmentation fault (core dumped) CUDA_VISIBLE_DEVICES="0,1,2,3" python -m xmcgan.main --config="$CONFIG" --mode="train" --workdir="$WORKDIR"
I also encountered the Segmentation fault error. But when I changed Tensorflow version, I was able to address the problem.
Did you use the author's uploaded requirement.txt?
error:
I1216 05:03:33.303731 140638140207296 utils.py:31] Checkpoint.restore_or_initialize() ...
I1216 05:03:33.304307 140638140207296 checkpoint.py:301] No checkpoint specified. Restore the latest checkpoint.
I1216 05:03:33.304460 140638140207296 utils.py:31] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() ...
I1216 05:03:33.312287 140638140207296 checkpoint.py:430] Checked checkpoint base_directories: ['path/to/exp/exp_name/checkpoints-0'] - common_numbers={1} - exclusive_numbers=set()
I1216 05:03:33.312516 140638140207296 utils.py:41] MultihostCheckpoint.get_latest_checkpoint_to_restore_from() finished after 0.01s.
I1216 05:03:33.312650 140638140207296 checkpoint.py:307] Restoring checkpoint: path/to/exp/exp_name/checkpoints-0/ckpt-1
2021-12-16 05:03:33.316385: W ./tensorflow/core/framework/dataset.h:550] Failed precondition: StatelessRandomGetKeyCounter is stateful.
I1216 05:03:45.659061 140638140207296 checkpoint.py:312] Restored save_counter=1 restored_checkpoint=path/to/exp/exp_name/checkpoints-0/ckpt-1
I1216 05:03:45.659443 140638140207296 utils.py:41] Checkpoint.restore_or_initialize() finished after 12.36s.
I1216 05:03:47.525738 140590360545024 logging_writer.py:56] Hyperparameters: {'architecture': 'xmc_net', 'batch_norm_group_size': -1, 'batch_size': 8, 'beta1': 0.5, 'beta2': 0.999, 'checkpoint_every_steps': 5000, 'coco_version': '2014', 'cond_size': 16, 'd_lr': 0.0004, 'd_spectral_norm': True, 'd_step_per_g_step': 14, 'data_dir': 'data/', 'dataset': 'mscoco', 'df_dim': 96, 'dtype': 'bfloat16', 'eval_avg_num': 3, 'eval_batch_size': 4, 'eval_every_steps': 1000, 'eval_num': 30000, 'g_lr': 0.0001, 'g_spectral_norm': False, 'gamma_for_g': 15, 'gf_dim': 96, 'image_contrastive': True, 'image_size': 128, 'log_loss_every_steps': 1000, 'model_name': 'xmc', 'num_epochs': 500, 'num_train_steps': -1, 'polyak_decay': 0.999, 'pretrained_image_contrastive': True, 'return_filename': False, 'return_text': False, 'seed': 42, 'sentence_contrastive': True, 'show_num': 64, 'shuffle_buffer_size': 1000, 'train_shuffle': True, 'trial': 0, 'word_contrastive': True, 'z_dim': 128}
I1216 05:03:47.528530 140638140207296 train_utils.py:404] Starting training loop at step 1.
/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/profiler.py:166: UserWarning: StepTraceContext has been renamed to StepTraceAnnotation. This alias will eventually be removed; please update your code.
warnings.warn(
Fatal Python error: Segmentation fault
Thread 0x00007fddbdffb700 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/concurrent/futures/thread.py", line 75 in _worker
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 910 in run
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 930 in _bootstrap
Thread 0x00007fddbe7fc700 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/concurrent/futures/thread.py", line 75 in _worker
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 910 in run
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 973 in _bootstrap_inner
File "/root/yes/envs/py39/lib/python3.9/threading.py", line 930 in _bootstrap
Current thread 0x00007fe8de6390c0 (most recent call first):
File "/root/yes/envs/py39/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1955 in shape
File "<array_function internals>", line 5 in shape
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1307 in
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1307 in _mapped_axis_size
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1633 in f_pmapped
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/api.py", line 1725 in f_pmapped
File "/root/yes/envs/py39/lib/python3.9/site-packages/jax/_src/traceback_util.py", line 162 in reraise_with_filtered_traceback
File "/xmc_gan/xmcgan/train_utils.py", line 424 in train
File "/xmc_gan/xmcgan/main.py", line 62 in main
File "/root/yes/envs/py39/lib/python3.9/site-packages/absl/app.py", line 251 in _run_main
File "/root/yes/envs/py39/lib/python3.9/site-packages/absl/app.py", line 303 in run
File "/xmc_gan/xmcgan/main.py", line 70 in
File "/root/yes/envs/py39/lib/python3.9/runpy.py", line 87 in _run_code
File "/root/yes/envs/py39/lib/python3.9/runpy.py", line 197 in _run_module_as_main
train.sh: line 24: 45523 Segmentation fault (core dumped) CUDA_VISIBLE_DEVICES="0,1,2,3" python -m xmcgan.main --config="$CONFIG" --mode="train" --workdir="$WORKDIR"
details:
config.batch_size = 8
config.d_step_per_g_step = 14
Have you ever come across this mistake?
The text was updated successfully, but these errors were encountered: