Model training results and custom calculated DSCs are not the same? #1107

Minxiangliu · 2022-10-31T09:51:10Z

Minxiangliu
Oct 31, 2022

Thanks to monailabel for providing the environment and tools for calculating the training model, I have used monailabel to train my own data, and I have also successfully used 3D Slicer for inference, but due to the experimental relationship, I want to calculate some additional accuracy, so I try to load myself I trained the model and verified in advance whether the DSC is consistent with the monailabel record. I imitated the monailabel code to write a method to calculate the DSC, but the final result was very inconsistent. I want to clarify the reason.
Thanks in advance.

The monailabel final output train_stats.json:

{
  "total_epochs": 1000,
  "total_iterations": 19,
  "epoch": 97,
  "start_ts": 1667200676,
  "current_time": "0:12:51",
  "best_metric": 0.676832914352417,
  "train": {
    "metrics": {
      "train_dice": 0.683293879032135,
      "train_cancer_dice": 0.683293879032135
    },
    "key_metric_name": "train_dice",
    "best_metric": 0.683293879032135,
    "best_metric_epoch": 97
  },
  "eval": {
    "metrics": {
      "val_mean_dice": 0.6541317105293274,
      "val_cancer_dice": 0.6541317105293274
    },
    "key_metric_name": "val_mean_dice",
    "best_metric": 0.676832914352417,
    "best_metric_epoch": 96
  }
}

The 3D Slicer outcome:

Non-using monailabel, the result of inference by self:

Part of the code used by monailabel during training:

    def train_post_transforms(self, context: Context):
        return [
            EnsureTyped(keys="pred", device=context.device),
            Activationsd(keys="pred", softmax=True),
            AsDiscreted(
                keys=("pred", "label"),
                argmax=(True, False),
                to_onehot=(len(self.labels) + 1, len(self.labels) + 1),
            ),
        ]

    def val_pre_transforms(self, context: Context):
        val_transforms = self.pre_trans.copy()
        val_transforms.extend([
            SelectItemsd(keys=['image','label'])
        ])
        return val_transforms

    def val_inferer(self, context: Context):
        return SlidingWindowInferer(**self.slidingWindowInfererParams)

    def train_key_metric(self, context: Context):
        return region_wise_metrics(self.labels, self.TRAIN_KEY_METRIC, "train")

    def val_key_metric(self, context: Context):
        return region_wise_metrics(self.labels, self.VAL_KEY_METRIC, "val")
}

Test code written by self:

def load_model(path:str, model):
    state_dict = torch.load(path, map_location='cuda')
    model.load_state_dict(state_dict, strict=True)
    model.eval().cuda()
    return model

def getDataList():
    with open('../../DataSets/cvDataSets/testCV1.json', 'r') as file:
        val_ds = json.load(file)
    return val_ds

def val_dataloader(test_ds):
    return ThreadDataLoader(test_ds, num_workers=0, batch_size=2, shuffle=False, drop_last=False)

if __name__ == "__main__":
    transform = getPreTransforms(**trans_params)
    transform.extend([
        EnsureTyped(keys=['image','label']), 
        ToDeviced(keys=['image','label'],device='cuda'),
        SelectItemsd(keys=['image','label'])])
    transform = Compose(transform)
    datasetParameter = {'transform':transform, 'cache_rate':1.0, 'copy_cache':False, 'num_workers':2}
    datasetParameter['data'] = getDataList()
    test_ds = CacheDataset(**datasetParameter)
    loader = val_dataloader(test_ds)
    
    model = load_model(
        path='model/segmentation.pt', 
        model=network)

    post_pred = Compose([
        Activations(softmax=True), 
        AsDiscrete(argmax=True, to_onehot=len(trans_params['labels']) + 1)])
    post_label = Compose([
        AsDiscrete(argmax=False, to_onehot=len(trans_params['labels']) + 1)])

    dice_metric = DiceMetric(include_background=False, reduction="mean")

    for idx, batch_data in enumerate(loader):
        image, label = batch_data['image'], batch_data['label']
        outputs = sliding_window_inference(inputs=image, predictor=model, **slidingWindowInfererParams)
        
        outputs = torch.stack([post_pred(i) for i in decollate_batch(outputs)])
        label = torch.stack([post_label(i) for i in decollate_batch(label)])

    metric = dice_metric.aggregate(reduction='mean').item()
    dice_metric.reset()
    print(metric)

The result of my own test is about 50%, which is different from the monailabel result of 67%.
The same goes for all the models I've tested output by monailabel.

In addition, I have specified training data and test data, here I am not sure whether monailabel uses the test data I prepared after verification.

    def partition_datalist(self, context: Context, shuffle=False):
        with open(r'../../DataSets/cvDataSets/trainCV1.json', 'r') as file:
            train_ds = json.load(file)

        with open(r'../../DataSets/cvDataSets/testCV1.json', 'r') as file:
            val_ds = json.load(file)

        return train_ds, val_ds

Minxiangliu · 2022-11-01T02:29:22Z

Minxiangliu
Nov 1, 2022
Author

I continued to try to reproduce the final validation accuracy recorded during monailabel training, and tested through the val_key_metric method used by monailabel, the results were still much worse than monailabel.

from monailabel.tasks.train.utils import region_wise_metrics
from monai.engines import SupervisedEvaluator
from monai.handlers import CheckpointLoader, StatsHandler
from monai.inferers import SlidingWindowInferer
from monai.apps import get_logger
import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO)

def post_transforms(labels):
    return Compose([
        EnsureTyped(keys="pred", device=torch.device('cuda')),
        Activationsd(keys="pred", softmax=True),
        AsDiscreted(
            keys=("pred", "label"),
            argmax=(True, False),
            to_onehot=(len(labels) + 1, len(labels) + 1),
        ),
    ])

if __name__ == "__main__":
    get_logger("eval_log")

    transform = getPreTransforms(**trans_params)
    transform.extend([EnsureTyped(keys=['image','label'], device=torch.device('cuda'))])
    transform = Compose(transform)
    datasetParameter = {'transform':transform, 'cache_rate':1.0, 'copy_cache':False, 'num_workers':2}
    datasetParameter['data'] = getDataList()
    test_ds = CacheDataset(**datasetParameter)
    loader = val_dataloader(test_ds)
    
    val_handlers = [
        StatsHandler(name="eval_log", output_transform=lambda x: None),
        CheckpointLoader(load_path='model/segmentation.pt', load_dict={"network": network.cuda()}),
    ]
    evaluator = SupervisedEvaluator(
        device=torch.device('cuda'),
        val_data_loader=loader,
        network=network,
        inferer=SlidingWindowInferer(**slidingWindowInfererParams),
        postprocessing=post_transforms(trans_params['labels']),
        key_val_metric=region_wise_metrics(trans_params['labels'], "val_mean_dice", "val"),
        amp=True,
        val_handlers=val_handlers,
    )
    evaluator.run()

Output:

INFO:ignite.engine.engine.SupervisedEvaluator:Engine run resuming from iteration 0, epoch 0 until 1 epochs
INFO:ignite.engine.engine.SupervisedEvaluator:Restored all variables from model/segmentation.pt
INFO:ignite.engine.engine.SupervisedEvaluator:Got new best metric of val_mean_dice: 0.5336676239967346
2022-11-01 10:19:58,254 - INFO - Epoch[1] Metrics -- val_cancer_dice: 0.5337 val_mean_dice: 0.5337 
2022-11-01 10:19:58,254 - INFO - Key metric: val_mean_dice best value: 0.5336676239967346 at epoch: 1
INFO:ignite.engine.engine.SupervisedEvaluator:Epoch[1] Complete. Time taken: 00:00:10
INFO:ignite.engine.engine.SupervisedEvaluator:Engine run complete. Time taken: 00:00:11

For the DSC recorded in the monailabel there is a large difference.
Whether you use the same file during monailabel training or during your own testing, the only thing that cannot be confirmed is whether monailabel uses the test file I provided correctly.
Can someone help me clarify the problem? Thanks in advance.

0 replies

Minxiangliu · 2022-11-01T04:15:31Z

Minxiangliu
Nov 1, 2022
Author

When training with monailabel, the log shows: Records for Validation: 38, which means that monailabel uses the test data I prepared, and then the current problem is still using the same data and transformation. After DSC calculation, the result of monailabel will be very larges difference.

0 replies

tangy5 · 2022-11-01T04:22:29Z

tangy5
Nov 1, 2022
Maintainer

Hi @Minxiangliu , thanks for posting this separately. Let's see if someone else has any suggestions and thoughts. The community of monailabel users might have some insights.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model training results and custom calculated DSCs are not the same? #1107

{{title}}

Replies: 3 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Model training results and custom calculated DSCs are not the same? #1107

Minxiangliu Oct 31, 2022

Replies: 3 comments

Minxiangliu Nov 1, 2022 Author

Minxiangliu Nov 1, 2022 Author

tangy5 Nov 1, 2022 Maintainer

Minxiangliu
Oct 31, 2022

Minxiangliu
Nov 1, 2022
Author

Minxiangliu
Nov 1, 2022
Author

tangy5
Nov 1, 2022
Maintainer