Auto3D Swinunet fails with Instance22 dataset #5742

AHarouni · 2022-12-14T18:25:53Z

Running Auto3d with instance22 works with all networks. When I wanted to duplicate the data json in order to simulate larger dataset all networks worked except SwinUnet.

To Reproduce
1 - use Auto3d with instance22 using the dataset.json attached. I changed the extension to txt as json was not supported for uploading
2 - run script below to only trigger swinunet

train_1_node(){
    FOLDER="/workspace/${WORK_DIR}/${MODEL}_${FOLD}"
    rm -r $FOLDER/model_fold$FOLD
    CONF_FOLDER=${FOLDER}"/configs"
    rm ${FOLDER}/${MODEL}.log

    (time \
    torchrun --nnodes=1 --nproc_per_node=8 \
        ${SCRIPT} run \
        --config_file "['${CONF_FOLDER}/hyper_parameters.yaml','${CONF_FOLDER}/network.yaml','${CONF_FOLDER}/transforms_train.yaml','${CONF_FOLDER}/transforms_validate.yaml']" \
        $EXTRA_PRAMS ) 2>&1 | tee -i -p ${FOLDER}/${MODEL}.log
}

swinunetr(){
    MODEL="swinunetr"
    SCRIPT="-m ${WORK_DIR}.${MODEL}_${FOLD}.scripts.train"
    ## new paramets makes it run for 20,000 epochs  !! force it to 1,500
    EXTRA_PRAMS=" --num_images_per_batch 16"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_patches_per_image 1"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_iterations 1500"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_iterations_per_validation 100"
    EXTRA_PRAMS=$EXTRA_PRAMS" --num_sw_batch_size 36"
    train_1_node
}

swinunetr

Error

epoch 8/210
learning rate is set to 0.0001
[2022-11-29 21:44:18] 1/7, train_loss: 0.4237
[2022-11-29 21:44:19] 2/7, train_loss: 0.4575
2022-11-29 21:44:25,647 - > collate dict key "image" out of 4 keys
2022-11-29 21:44:25,701 - >> collate/stack a list of tensors
2022-11-29 21:44:25,705 - >> E: stack expects each tensor to be equal size, but got [1, 96, 96, 64] at entry 0 and [1, 96, 95, 64] at entry 10, shape [(1, 96, 96, 64), (
1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 95, 64), 
(1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64), (1, 96, 96, 64)] in collate([tensor([[[[0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],
          [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],

[  0.16601867,   0.11132774,   0.97981832, -12.53823159],
       [  0.        ,   0.        ,   0.        ,   1.        ]])},
                                id: 140606314046512,
                                orig_size: (96, 96, 64)},
                  id: 140604144127376,
                  orig_size: (96, 96, 64)},
    id: 140604144127184,
    orig_size: (96, 96, 64)}]
Is batch?: False] ... )
2022-12-06 20:32:04,170 - > collate dict key "label" out of 4 keys
2022-12-06 20:32:04,219 - >> collate/stack a list of tensors

Expected behavior
As you see from the error log it actually starts training in to 1 sometimes 10 epochs then it errors out. Expected for it wo continue running

The text was updated successfully, but these errors were encountered:

tangy5 · 2022-12-14T18:29:30Z

Thanks, investigating this now.

Nic-Ma · 2022-12-15T14:47:08Z

Hi @tangy5 ,

Seems like the error is because images have different shape ((1, 96, 96, 64) vs (1, 96, 95, 64)) when stacking them?

Thanks.

wyli · 2023-04-19T06:06:10Z

Probably addressed by #5950 please feel free to reopen if you see the same with a recent version

Nic-Ma assigned tangy5 Dec 15, 2022

Nic-Ma added the question Further information is requested label Dec 15, 2022

myron added this to the Auto3D Seg framework [internal ongoing milestone] milestone Dec 15, 2022

wyli closed this as completed Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto3D Swinunet fails with Instance22 dataset #5742

Auto3D Swinunet fails with Instance22 dataset #5742

AHarouni commented Dec 14, 2022

tangy5 commented Dec 14, 2022

Nic-Ma commented Dec 15, 2022

wyli commented Apr 19, 2023

Auto3D Swinunet fails with Instance22 dataset #5742

Auto3D Swinunet fails with Instance22 dataset #5742

Comments

AHarouni commented Dec 14, 2022

tangy5 commented Dec 14, 2022

Nic-Ma commented Dec 15, 2022

wyli commented Apr 19, 2023