From c07d12a5d03739a8912d16857ba4cff6de11bab1 Mon Sep 17 00:00:00 2001 From: Eugene Khvedchenya Date: Mon, 7 Aug 2023 08:57:25 +0300 Subject: [PATCH 1/5] Update docs --- documentation/source/ObjectDetection.md | 134 +++++++++++++++++++++++- 1 file changed, 129 insertions(+), 5 deletions(-) diff --git a/documentation/source/ObjectDetection.md b/documentation/source/ObjectDetection.md index 31d126570c..3e38d3a51d 100644 --- a/documentation/source/ObjectDetection.md +++ b/documentation/source/ObjectDetection.md @@ -10,12 +10,136 @@ In SuperGradients, we aim to collect such models and make them very convenient a ## Implemented models -| Model | Yaml | Model class | Loss Class | NMS Callback | -|--------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| [SSD](https://arxiv.org/abs/1512.02325) | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss) | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback) | -| [YOLOX](https://arxiv.org/abs/2107.08430) | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml) | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py) | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback) | -| [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml) | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) | +| Model | Yaml | Model class | Loss Class | NMS Callback | +|----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| [SSD](https://arxiv.org/abs/1512.02325) | [ssd_lite_mobilenetv2_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ssd_lite_mobilenetv2_arch_params.yaml) | [SSDLiteMobileNetV2](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/ssd.py) | [SSDLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ssd_loss.SSDLoss) | [SSDPostPredictCallback](https://docs.deci.ai/super-gradients/docstring/training/utils.html#training.utils.ssd_utils.SSDPostPredictCallback) | +| [YOLOX](https://arxiv.org/abs/2107.08430) | [yolox_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/yolox_s_arch_params.yaml) | [YoloX_S](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/training/models/detection_models/yolox.py) | [YoloXFastDetectionLoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.yolox_loss.YoloXFastDetectionLoss) | [YoloXPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.yolo_base.YoloXPostPredictionCallback) | +| [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml) | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) | +| YoloNAS | [yolo_nas_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/recipes/arch_params/yolo_nas_s_arch_params.yaml) | [Yolo NAS S](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py#L16) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) | +## Understanding model's predictions + +This section covers what is the output of each model class in train, eval and tracing modes. +Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models. +That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model. +Mixing them with other models will result in an error. + +It is important to understand the output of the model class in order to use it correctly in the training process and especially +if you are going to use the model's prediction in a custom callback or loss. + + +### YoloX +#### Training mode + +In training mode, YoloX returns a list of 3 tensors that contains the intermediates required for the loss calculation. +They correspond to output feature maps of the prediction heads: +- Output feature map at index 0: `[B, 1, H/8, W/8, C + 5]` +- Output feature map at index 1: `[B, 1, H/16, W/16, C + 5]` +- Output feature map at index 2: `[B, 1, H/32, W/32, C + 5]` + +Value `C` corresponds to the number of classes in the dataset. +And remaining `5`elements are box coordinates and objectness score. +Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]` +Box regression in these outputs are NOT in pixel coordinates. +X and Y coordinates are normalized coordinates. +Width and height values are the power factor for the base of `e` + +`raw_predictions_0, raw_predictions_1, raw_predictions_2 = yolo_x_model(images)` + +In this mode, predictions decoding is not performed. + +#### Eval mode + +In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates. + +`predictions, (raw_predictions_0, raw_predictions_1, raw_predictions_2) = yolo_x_model(images)` + +`predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps. + +The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`. +Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`. + +#### Tracing mode + +Same as in Eval mode. + + +### PPYolo-E +#### Training mode + +In training mode, PPYoloE returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. +You can access individual components of the model's output using the following snippet: + +`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` + +They are as follows: + * `cls_score_list` - `[B, num_anchors, num_classes]` + * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` + * `anchors` - `[num_anchors, 4]` + * `anchor_points` - `[num_anchors, 2]` + * `num_anchors_list` - `[num_anchors]` + * `stride_tensor` - `[num_anchors]` + +In this mode, predictions decoding is not performed. + +#### Eval mode + +In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode: + +`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` + +New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: + + * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates + * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box + +Please note that box predictions are not clipped and may extend beyond the image boundaries. +Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. + +#### Tracing mode + +In tracing mode, Yolo-NAS returns only decoded predictions: + +`pred_bboxes, pred_scores = yolo_nas_model(images)` + +### Yolo NAS +#### Training mode + +In training mode, Yolo-NAS returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. +You can access individual components of the model's output using the following snippet: + +`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` + +They are as follows: + * `cls_score_list` - `[B, num_anchors, num_classes]` + * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` + * `anchors` - `[num_anchors, 4]` + * `anchor_points` - `[num_anchors, 2]` + * `num_anchors_list` - `[num_anchors]` + * `stride_tensor` - `[num_anchors]` + +In this mode, predictions decoding is not performed. + + +#### Eval mode + +In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode: + +`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` + +New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: + + * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates + * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box + +Please note that box predictions are not clipped and may extend beyond the image boundaries. +Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. + +#### Tracing mode + +In tracing mode, Yolo-NAS returns only decoded predictions: + +`pred_bboxes, pred_scores = yolo_nas_model(images)` ## Training From 787a506b0d018b9f913adddbf289f8b6b23c0f08 Mon Sep 17 00:00:00 2001 From: Eugene Khvedchenya Date: Mon, 7 Aug 2023 12:10:36 +0300 Subject: [PATCH 2/5] Clarify docs --- documentation/source/ObjectDetection.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/documentation/source/ObjectDetection.md b/documentation/source/ObjectDetection.md index 3e38d3a51d..8c8179700f 100644 --- a/documentation/source/ObjectDetection.md +++ b/documentation/source/ObjectDetection.md @@ -19,7 +19,8 @@ In SuperGradients, we aim to collect such models and make them very convenient a ## Understanding model's predictions -This section covers what is the output of each model class in train, eval and tracing modes. +This section covers what is the output of each model class in train, eval and tracing modes. A tracing mode is enabled +when exporting model to ONNX or when using `torch.jit.trace()` call Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models. That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model. Mixing them with other models will result in an error. @@ -44,7 +45,7 @@ Box regression in these outputs are NOT in pixel coordinates. X and Y coordinates are normalized coordinates. Width and height values are the power factor for the base of `e` -`raw_predictions_0, raw_predictions_1, raw_predictions_2 = yolo_x_model(images)` +`output_feature_map_at_index_0, output_feature_map_at_index_1, output_feature_map_at_index_2 = yolo_x_model(images)` In this mode, predictions decoding is not performed. From 0b5bd4a36d864ed2c8c645418f91607f62bf38d0 Mon Sep 17 00:00:00 2001 From: Eugene Khvedchenya Date: Mon, 7 Aug 2023 12:14:23 +0300 Subject: [PATCH 3/5] Clarify docs --- documentation/source/ObjectDetection.md | 294 ++++++++++++------------ 1 file changed, 149 insertions(+), 145 deletions(-) diff --git a/documentation/source/ObjectDetection.md b/documentation/source/ObjectDetection.md index 8c8179700f..18415bd4b9 100644 --- a/documentation/source/ObjectDetection.md +++ b/documentation/source/ObjectDetection.md @@ -17,151 +17,6 @@ In SuperGradients, we aim to collect such models and make them very convenient a | [PPYolo](https://arxiv.org/abs/2007.12099) | [ppyoloe_arch_params](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/arch_params/ppyoloe_arch_params.yaml) | [PPYoloE](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.pp_yolo_e.PPYoloE) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) | | YoloNAS | [yolo_nas_s_arch_params](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/recipes/arch_params/yolo_nas_s_arch_params.yaml) | [Yolo NAS S](https://github.com/Deci-AI/super-gradients/blob/e1db4d99492a25f8e65b5d3e17a6ff2672c5467b/src/super_gradients/training/models/detection_models/yolo_nas/yolo_nas_variants.py#L16) | [PPYoloELoss](https://docs.deci.ai/super-gradients/docstring/training/losses.html#training.losses.ppyolo_loss.PPYoloELoss) | [PPYoloEPostPredictionCallback](https://docs.deci.ai/super-gradients/docstring/training/models.html#training.models.detection_models.pp_yolo_e.post_prediction_callback.PPYoloEPostPredictionCallback) | -## Understanding model's predictions - -This section covers what is the output of each model class in train, eval and tracing modes. A tracing mode is enabled -when exporting model to ONNX or when using `torch.jit.trace()` call -Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models. -That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model. -Mixing them with other models will result in an error. - -It is important to understand the output of the model class in order to use it correctly in the training process and especially -if you are going to use the model's prediction in a custom callback or loss. - - -### YoloX -#### Training mode - -In training mode, YoloX returns a list of 3 tensors that contains the intermediates required for the loss calculation. -They correspond to output feature maps of the prediction heads: -- Output feature map at index 0: `[B, 1, H/8, W/8, C + 5]` -- Output feature map at index 1: `[B, 1, H/16, W/16, C + 5]` -- Output feature map at index 2: `[B, 1, H/32, W/32, C + 5]` - -Value `C` corresponds to the number of classes in the dataset. -And remaining `5`elements are box coordinates and objectness score. -Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]` -Box regression in these outputs are NOT in pixel coordinates. -X and Y coordinates are normalized coordinates. -Width and height values are the power factor for the base of `e` - -`output_feature_map_at_index_0, output_feature_map_at_index_1, output_feature_map_at_index_2 = yolo_x_model(images)` - -In this mode, predictions decoding is not performed. - -#### Eval mode - -In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates. - -`predictions, (raw_predictions_0, raw_predictions_1, raw_predictions_2) = yolo_x_model(images)` - -`predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps. - -The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`. -Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`. - -#### Tracing mode - -Same as in Eval mode. - - -### PPYolo-E -#### Training mode - -In training mode, PPYoloE returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. -You can access individual components of the model's output using the following snippet: - -`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` - -They are as follows: - * `cls_score_list` - `[B, num_anchors, num_classes]` - * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` - * `anchors` - `[num_anchors, 4]` - * `anchor_points` - `[num_anchors, 2]` - * `num_anchors_list` - `[num_anchors]` - * `stride_tensor` - `[num_anchors]` - -In this mode, predictions decoding is not performed. - -#### Eval mode - -In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode: - -`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` - -New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: - - * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates - * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box - -Please note that box predictions are not clipped and may extend beyond the image boundaries. -Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. - -#### Tracing mode - -In tracing mode, Yolo-NAS returns only decoded predictions: - -`pred_bboxes, pred_scores = yolo_nas_model(images)` - -### Yolo NAS -#### Training mode - -In training mode, Yolo-NAS returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. -You can access individual components of the model's output using the following snippet: - -`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` - -They are as follows: - * `cls_score_list` - `[B, num_anchors, num_classes]` - * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` - * `anchors` - `[num_anchors, 4]` - * `anchor_points` - `[num_anchors, 2]` - * `num_anchors_list` - `[num_anchors]` - * `stride_tensor` - `[num_anchors]` - -In this mode, predictions decoding is not performed. - - -#### Eval mode - -In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode: - -`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` - -New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: - - * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates - * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box - -Please note that box predictions are not clipped and may extend beyond the image boundaries. -Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. - -#### Tracing mode - -In tracing mode, Yolo-NAS returns only decoded predictions: - -`pred_bboxes, pred_scores = yolo_nas_model(images)` - -## Training - -The easiest way to start training any mode in SuperGradients is to use a pre-defined recipe. In this tutorial, we will see how to train `YOLOX-S` model, other models can be trained by analogy. - -### Prerequisites - -1. You have to install SuperGradients first. Please refer to the [Installation](installation.md) section for more details. -2. Prepare the COCO dataset as described in the [Computer Vision Datasets Setup](https://docs.deci.ai/super-gradients/src/super_gradients/training/datasets/Dataset_Setup_Instructions/) under Detection Datasets section. - -After you meet the prerequisites, you can start training the model by running from the root of the repository: - -### Training from recipe - -```bash -python -m super_gradients.train_from_recipe --config-name=coco2017_yolox multi_gpu=Off num_gpus=1 -``` - -Note, the default configuration for this recipe is to use 8 GPUs in DDP mode. This hardware configuration may not be for everyone, so in the example above we override GPU settings to use a single GPU. -It is highly recommended to read through the recipe file [coco2017_yolox](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/coco2017_yolox.yaml) to get better understanding of the hyperparameters we use here. -If you're unfamiliar with config files, we recommend you to read the [Configuration Files](configuration_files.md) part first. ### Datasets @@ -528,6 +383,155 @@ num_classes: 3 And you should be good to go! +## Understanding model's predictions + +This section covers what is the output of each model class in train, eval and tracing modes. A tracing mode is enabled +when exporting model to ONNX or when using `torch.jit.trace()` call +Corresponding loss functions and post-prediction callbacks from the table above are written to match the output format of the models. +That being said, if you're using YoloX model, you should use YoloX loss and post-prediction callback for YoloX model. +Mixing them with other models will result in an error. + +It is important to understand the output of the model class in order to use it correctly in the training process and especially +if you are going to use the model's prediction in a custom callback or loss. + + +### YoloX +#### Training mode + +In training mode, YoloX returns a list of 3 tensors that contains the intermediates required for the loss calculation. +They correspond to output feature maps of the prediction heads: +- Output feature map at index 0: `[B, 1, H/8, W/8, C + 5]` +- Output feature map at index 1: `[B, 1, H/16, W/16, C + 5]` +- Output feature map at index 2: `[B, 1, H/32, W/32, C + 5]` + +Value `C` corresponds to the number of classes in the dataset. +And remaining `5`elements are box coordinates and objectness score. +Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]` +Box regression in these outputs are NOT in pixel coordinates. +X and Y coordinates are normalized coordinates. +Width and height values are the power factor for the base of `e` + +`output_feature_map_at_index_0, output_feature_map_at_index_1, output_feature_map_at_index_2 = yolo_x_model(images)` + +In this mode, predictions decoding is not performed. + +#### Eval mode + +In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates. + +`predictions, (raw_predictions_0, raw_predictions_1, raw_predictions_2) = yolo_x_model(images)` + +`predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps. + +The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`. +Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`. + +#### Tracing mode + +Same as in Eval mode. + + +### PPYolo-E +#### Training mode + +In training mode, PPYoloE returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. +You can access individual components of the model's output using the following snippet: + +`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` + +They are as follows: + * `cls_score_list` - `[B, num_anchors, num_classes]` + * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` + * `anchors` - `[num_anchors, 4]` + * `anchor_points` - `[num_anchors, 2]` + * `num_anchors_list` - `[num_anchors]` + * `stride_tensor` - `[num_anchors]` + +In this mode, predictions decoding is not performed. + +#### Eval mode + +In eval mode, Yolo-NAS returns a tuple of 2 tensors: `decoded_predictions, raw_intermediates`. +A `decoded_predictions` itself is a tuple of 2 tensors with decoded bounding boxes and class scores. +And `raw_intermediates` is a tuple of 6 tensors that contains the intermediates required for the loss calculation (Same as in training mode). + +`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` + +New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: + + * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates + * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box + +Please note that box predictions are not clipped and may extend beyond the image boundaries. +Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. + +#### Tracing mode + +In tracing mode, Yolo-NAS returns only decoded predictions: + +`pred_bboxes, pred_scores = yolo_nas_model(images)` + +### Yolo NAS +#### Training mode + +In training mode, Yolo-NAS returns a tuple of 6 tensors that contains the intermediates required for the loss calculation. +You can access individual components of the model's output using the following snippet: + +`cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor = yolo_nas_model(images)` + +They are as follows: + * `cls_score_list` - `[B, num_anchors, num_classes]` + * `reg_distri_list` - `[B, num_anchors, num_regression_dims]` + * `anchors` - `[num_anchors, 4]` + * `anchor_points` - `[num_anchors, 2]` + * `num_anchors_list` - `[num_anchors]` + * `stride_tensor` - `[num_anchors]` + +In this mode, predictions decoding is not performed. + + +#### Eval mode + +In eval mode, Yolo-NAS returns a tuple of 2 tensors that contains the decoded predictions and the intermediates as in train mode: + +`(pred_bboxes, pred_scores), (cls_score_list, reg_distri_list, anchors, anchor_points, num_anchors_list, stride_tensor) = yolo_nas_model(images)` + +New outputs `pred_bboxes` and `pred_scores` are decoded predictions of the model. They are as follows: + + * `pred_bboxes` - `[B, num_anchors, 4]` - decoded bounding boxes in the format `[x1, y1, x2, y2]` in absolute (pixel) coordinates + * `pred_scores` - `[B, num_anchors, num_classes]` - class scores `(0..1)` for each bounding box + +Please note that box predictions are not clipped and may extend beyond the image boundaries. +Additionally, the NMS is not performed yet at this stage. This is where the post-prediction callback comes into play. + +#### Tracing mode + +In tracing mode, Yolo-NAS returns only decoded predictions: + +`pred_bboxes, pred_scores = yolo_nas_model(images)` + +## Training + +The easiest way to start training any mode in SuperGradients is to use a pre-defined recipe. In this tutorial, we will see how to train `YOLOX-S` model, other models can be trained by analogy. + +### Prerequisites + +1. You have to install SuperGradients first. Please refer to the [Installation](installation.md) section for more details. +2. Prepare the COCO dataset as described in the [Computer Vision Datasets Setup](https://docs.deci.ai/super-gradients/src/super_gradients/training/datasets/Dataset_Setup_Instructions/) under Detection Datasets section. + +After you meet the prerequisites, you can start training the model by running from the root of the repository: + +### Training from recipe + +```bash +python -m super_gradients.train_from_recipe --config-name=coco2017_yolox multi_gpu=Off num_gpus=1 +``` + +Note, the default configuration for this recipe is to use 8 GPUs in DDP mode. This hardware configuration may not be for everyone, so in the example above we override GPU settings to use a single GPU. +It is highly recommended to read through the recipe file [coco2017_yolox](https://github.com/Deci-AI/super-gradients/blob/master/src/super_gradients/recipes/coco2017_yolox.yaml) to get better understanding of the hyperparameters we use here. +If you're unfamiliar with config files, we recommend you to read the [Configuration Files](configuration_files.md) part first. + + ## How to add a new model To implement a new model, you need to add the following parts: From eb4d42f24d090810dfb01dd8009d33c0209951bf Mon Sep 17 00:00:00 2001 From: Eugene Khvedchenya Date: Mon, 7 Aug 2023 12:20:36 +0300 Subject: [PATCH 4/5] Clarify docs --- documentation/source/ObjectDetection.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/documentation/source/ObjectDetection.md b/documentation/source/ObjectDetection.md index 18415bd4b9..555c428a58 100644 --- a/documentation/source/ObjectDetection.md +++ b/documentation/source/ObjectDetection.md @@ -406,7 +406,7 @@ They correspond to output feature maps of the prediction heads: Value `C` corresponds to the number of classes in the dataset. And remaining `5`elements are box coordinates and objectness score. -Layout of elements in the last dimension is as follows: `[x, y, w, h, obj_score, class_scores...]` +Layout of elements in the last dimension is as follows: `[cx, cy, w, h, obj_score, class_scores...]` Box regression in these outputs are NOT in pixel coordinates. X and Y coordinates are normalized coordinates. Width and height values are the power factor for the base of `e` @@ -423,8 +423,8 @@ In eval mode, YoloX returns a tuple of decoded predictions and raw intermediates `predictions` is a single tensor of shape `[B, num_predictions, C + 5]` where `num_predictions` is the total number of predictions across all 3 output feature maps. -The layout of the last dimension is the same as in training mode: `[x, y, w, h, obj_score, class_scores...]`. -Values of `x`, `y`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`. +The layout of the last dimension is the same as in training mode: `[cx, cy, w, h, obj_score, class_scores...]`. +Values of `cx`, `cy`, `w`, `h` are in absolute pixel coordinates and confidence scores are in range `[0, 1]`. #### Tracing mode From 22318d57509e03342b6d5d6a74643008636e0fcb Mon Sep 17 00:00:00 2001 From: Eugene Khvedchenya Date: Mon, 7 Aug 2023 13:16:13 +0300 Subject: [PATCH 5/5] Clarify docs --- documentation/source/ObjectDetection.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/documentation/source/ObjectDetection.md b/documentation/source/ObjectDetection.md index 555c428a58..92292ec430 100644 --- a/documentation/source/ObjectDetection.md +++ b/documentation/source/ObjectDetection.md @@ -65,6 +65,12 @@ from super_gradients.training.models.detection_models.yolo_base import YoloXPost post_prediction_callback = YoloXPostPredictionCallback(conf=0.001, iou=0.6) ``` +All post prediction callbacks returns a list of lists with decoded boxes after NMS: `List[torch.Tensor]`. +The first list wraps all images in the batch, and each tensor holds all predictions for each image in the batch. +The shape of predictions tensor is `[N, 6]` where N is the number of predictions for the image and each row is holds values of `[X1, Y1, X2, Y2, confidence, class_id]`. + +Box coordinates are in absolute (pixel) units. + ### Visualization Visualization of the model predictions is a very important part of the training process for any computer vision task.