Add warp_perspective operator #5542

banasraf · 2024-06-28T10:55:31Z

Category:

New feature

Description:

It adds a new experimental.warp_perspective operator that uses CV-CUDA operator as its implementation.

Additional information:

Affected modules and functionalities:

New operator

Key points relevant for the review:

Correctness of handing of the parameteres

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: N/A

Signed-off-by: Rafal Banas <[email protected]>

dali/test/python/operator_2/test_warp_perspective.py

+    elif dtype == np.int16 or dtype == np.uint16:
+        eps = 5
+
+    test_utils.compare_pipelines(pipe1, pipe2, batch_size=bs, N_iterations=10, eps=eps)


dali/operators/image/remap/cvcuda/warp_perspective.cc

dali/test/python/operator_2/test_warp_perspective.py

mzient · 2024-07-02T07:55:02Z

dali/operators/image/remap/cvcuda/warp_perspective.cc

+      matrix = AcquireTensorArgument(ws, scratchpad, matrix_arg_, TensorShape<1>(9),
+                                     nvcvop::GetDataType<float>(), "W");


I'm pretty sure we need to apply a fixup to the matrix to match WarpAffine. We can add OpenCV compatibility (here and in WarpAffine) as an option, but I think being self-consistent is far better than randomly matching a patchwork of common libraries.

Done + added tests for compatibility with warp_affine

dali/operators/image/remap/cvcuda/warp_perspective.cc

szkarpinski · 2024-07-08T08:57:43Z

cmake/Dependencies.common.cmake

@@ -264,7 +264,7 @@ if (BUILD_CVCUDA)
  set(DALI_BUILD_PYTHON ${BUILD_PYTHON})
  set(BUILD_PYTHON OFF)
  # for now we use only median blur from CV-CUDA
-  set(CV_CUDA_SRC_PATERN medianblur median_blur morphology)
+  set(CV_CUDA_SRC_PATERN medianblur median_blur morphology warp)


Nitpick: I know it's not you but this should be PATTERN

That'll need to be fixed in cv-cuda

szkarpinski · 2024-07-08T09:14:12Z

dali/test/python/operator_2/test_warp_perspective.py

@@ -0,0 +1,372 @@
+# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.


We do a lot of parameter handling with tons of bug-prone ifs. I think we need to have some negative tests to check if we handle invalid argument combinations properly.

I added a few tests that check throwing errors.

dali/operators/image/remap/cvcuda/warp_perspective.cc

dali/operators/nvcvop/nvcvop.cc

… fixes Signed-off-by: Rafal Banas <[email protected]>

banasraf · 2024-07-16T09:30:27Z

!build

dali-automaton · 2024-07-16T09:35:26Z

CI MESSAGE: [16634050]: BUILD STARTED

dali/operators/image/remap/cvcuda/warp_perspective.cc

dali-automaton · 2024-07-16T10:28:49Z

CI MESSAGE: [16634050]: BUILD FAILED

Signed-off-by: Rafal Banas <[email protected]>

banasraf · 2024-07-16T13:31:03Z

!build

dali-automaton · 2024-07-16T13:35:14Z

CI MESSAGE: [16639288]: BUILD STARTED

dali/operators/image/remap/cvcuda/matrix_adjust.cu

mzient · 2024-07-16T15:31:07Z

dali/operators/image/remap/cvcuda/matrix_adjust.cu

+    int tid = blockIdx.x * blockDim.x + threadIdx.x;
+    int matrix_id = tid / 4;
+    if (matrix_id >= batch_size) {
+        return;
+    }
+    auto *data_ptr = wrap.ptr(matrix_id);
+    auto *matrix = reinterpret_cast<mat3*>(data_ptr);
+    int sub_tid = tid % 4;
+    if (sub_tid % 2 == 0) {
+        // this modifies only the first two rows
+        int row_id = sub_tid / 2;
+        matrix->set_row(row_id, matrix->row(row_id) - matrix->row(2) * 0.5);
+    }
+    __syncthreads();
+    if (sub_tid < 4) {
+        // this modifies only the third column
+        int row_id = sub_tid;
+        matrix->at(row_id, 2) = dot(matrix->row(row_id), vec3{0.5, 0.5, 1});
+    }


Isn't it overcomplicated? Even with a huge batch, we're talking about two 3x3 matrix multiplications per sample. The code is not very readable and still arguably the most costly part - loading the matrices from global memory - isn't optimized.
We can probably get away with sample-per-thread and just doing the whole thing - or we can go with a block of 32x9.

I benchmarked a few variants of the kernel and they made almost no difference on end to end op performance, so I went with single thread per matrix and the kernel is as simple as it can get.

Have you tested on more powerful cards like SXM H100?

mzient · 2024-07-16T16:03:39Z

dali/operators/image/remap/cvcuda/matrix_adjust.cu

+
+using MatricesWrap = nvcv::cuda::TensorWrap<float, 9 * sizeof(float), sizeof(float)>;
+
+__global__ void adjustMatricesKernel(MatricesWrap wrap, int batch_size) {


It's worth mentioning that the same fixup can be applied for dst->src and src->dst mapping.

dali-automaton · 2024-07-16T18:22:49Z

CI MESSAGE: [16639288]: BUILD FAILED

szalpal · 2024-07-16T18:57:54Z

dali/operators/nvcvop/nvcvop.cc

+nvcv::Tensor AsTensor(const Tensor<GPUBackend> &tensor, TensorLayout layout = "",
+                      const std::optional<TensorShape<>> &reshape = {}) {


Suggested change

nvcv::Tensor AsTensor(const Tensor<GPUBackend> &tensor, TensorLayout layout = "",

const std::optional<TensorShape<>> &reshape = {}) {

nvcv::Tensor AsTensor(const Tensor<GPUBackend> &tensor, TensorLayout layout,

const std::optional<TensorShape<>> &reshape) {

Wouldn't it be better to assign default arguments in the definition (in nvcvop.h) instead of here?
Also, if you're using std::optional, I believe that better default value would be std::nullopt instead of default constructor (I might be wrong though).

Edit: on the second thought, I understand this const optional & as "reference which might not contain a value". Would this be just const TensorShape<> * = nullptr?

I moved the default values to the definition and used nullopt instead of {}.

Regarding the optional vs nullable pointer: I prefer optional, it's safer imo

dali/operators/nvcvop/nvcvop.cc

Signed-off-by: Rafal Banas <[email protected]>

banasraf · 2024-07-17T14:29:35Z

!build

dali-automaton · 2024-07-17T14:35:09Z

CI MESSAGE: [16674363]: BUILD STARTED

dali/operators/image/remap/cvcuda/matrix_adjust.cu

dali/operators/image/remap/cvcuda/warp_perspective.cc

mzient · 2024-07-17T14:55:06Z

dali/operators/image/remap/cvcuda/warp_perspective.cc

+              channels, ". Number of values provided: ", fill_value_arg_.size(), "."));
+        }
+      } else {
+        DALI_FAIL("Only scalar fill_value can be provided when processing data in planar layout.");


Is that CV-CUDA limitation or ours?

Lets say CV-CUDA. The problem is CV-CUDA doesn't support planar layouts at all (in this op), and it only accepts single fill_value for the whole batch, so I cannot provide different fill value for each plane

We could launch the planes separately, but whatever.

dali/operators/image/remap/cvcuda/warp_perspective.cc

mzient · 2024-07-17T15:28:43Z

dali/test/python/test_dali_cpu_only.py

@@ -1615,6 +1615,7 @@ def full_like_pipe():
    "experimental.median_blur",  # not supported for CPU
    "experimental.dilate",  # not supported for CPU
    "experimental.erode",  # not supported for CPU
+    "experimental.warp_perspective",  # not supported for CPU


Why experimental?

All cv-cuda ops so far are in experimental. We can think of moving them all out soon. My major concern is the not-so-clear situation with depending on CV-CUDA. Right now we compile the sources but at some point we're going to move to actual dependency. That potentially could affect those ops

dali-automaton · 2024-07-17T20:59:23Z

CI MESSAGE: [16674363]: BUILD FAILED

dali/operators/image/remap/cvcuda/warp_perspective.cc

dali/operators/image/remap/cvcuda/matrix_adjust.cu

mzient · 2024-07-18T07:28:42Z

dali/operators/image/remap/cvcuda/matrix_adjust.cu

+        {0., 0., 1.}
+    }};
+
+    *matrix = shift_back * *matrix * shift;


I'm not sure if the compiler will be smart here. I recommend doing this:

auto m = *matrix; // this will make sure that the matrix is actually in registers m = m * shift; m(0, 2) -= 0.5f; // there's no point in running full matrix multiplication for the shift-back when all we need is 2 additions m(1, 2) -= 0.5f; *matrix = m;

Two additions is not all what you need. I changed it to:

matrix = matrix * shift; matrix.set_row(0, matrix.row(0) - matrix.row(2) * 0.5f); matrix.set_row(1, matrix.row(1) - matrix.row(2) * 0.5f);

Signed-off-by: Rafal Banas <[email protected]>

banasraf · 2024-07-18T10:13:38Z

!build

dali-automaton · 2024-07-18T10:15:22Z

CI MESSAGE: [16702897]: BUILD STARTED

dali-automaton · 2024-07-18T12:10:56Z

CI MESSAGE: [16702897]: BUILD FAILED

Signed-off-by: Rafal Banas <[email protected]>

banasraf · 2024-07-18T12:51:42Z

!build

dali-automaton · 2024-07-18T12:56:00Z

CI MESSAGE: [16706360]: BUILD STARTED

dali-automaton · 2024-07-18T14:52:21Z

CI MESSAGE: [16706360]: BUILD FAILED

jantonguirao · 2024-07-18T14:46:20Z

cmake/Dependencies.common.cmake

@@ -264,7 +264,7 @@ if (BUILD_CVCUDA)
  set(DALI_BUILD_PYTHON ${BUILD_PYTHON})
  set(BUILD_PYTHON OFF)
  # for now we use only median blur from CV-CUDA


the comment is no longer up to date

jantonguirao · 2024-07-18T14:48:19Z

dali/operators/image/remap/cvcuda/warp_perspective.cc

+    .InputDox(0, "input", "TensorList of uint8, uint16, int16 or float",
+              "Input data. Must be images in HWC or CHW layout, or a sequence of those.")
+    .InputDox(1, "matrix_gpu", "1D TensorList of float",
+              "Transformation matrix data. Should be used to pass the GPU data. "


Suggested change

"Transformation matrix data. Should be used to pass the GPU data. "

"Transformation matrix data, on GPU memory. "

jantonguirao · 2024-07-18T14:50:23Z

dali/operators/image/remap/cvcuda/matrix_adjust.h

+ * @brief Modifies (in-place) tensor of perspective matrices to match
+ * the OpenCV convention of pixel origin (center instead of corner).
+ */
+void adjustMatrices(nvcv::Tensor &matrices, cudaStream_t stream);


Suggested change

void adjustMatrices(nvcv::Tensor &matrices, cudaStream_t stream);

void adjustMatricesToPixelCenter(nvcv::Tensor &matrices, cudaStream_t stream);

nitpick: or something more explicit

dali-automaton · 2024-07-18T15:07:07Z

CI MESSAGE: [16706360]: BUILD PASSED

Add warp_perspective operator

a0f9683

Signed-off-by: Rafal Banas <[email protected]>

banasraf force-pushed the add-warp-perspective-operator branch from eacdc59 to a0f9683 Compare June 28, 2024 10:57

github-advanced-security bot found potential problems Jun 28, 2024

View reviewed changes

dali-automaton assigned szkarpinski and stiepan Jul 1, 2024

mzient reviewed Jul 1, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

mzient reviewed Jul 2, 2024

View reviewed changes

dali/test/python/operator_2/test_warp_perspective.py Show resolved Hide resolved

mzient reviewed Jul 2, 2024

View reviewed changes

szkarpinski reviewed Jul 8, 2024

View reviewed changes

stiepan reviewed Jul 8, 2024

View reviewed changes

Adjust transform matrix to match DALI pixel-origin convention. Review…

d0b1ac7

… fixes Signed-off-by: Rafal Banas <[email protected]>

banasraf force-pushed the add-warp-perspective-operator branch from b7828eb to d0b1ac7 Compare July 16, 2024 09:28

mzient self-assigned this Jul 16, 2024

banasraf unassigned stiepan and szkarpinski Jul 16, 2024

dali-automaton assigned jantonguirao and szalpal Jul 16, 2024

szalpal reviewed Jul 16, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

szalpal reviewed Jul 16, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

Fix clang build. Remove leftovers

4a2c0d4

Signed-off-by: Rafal Banas <[email protected]>

mzient reviewed Jul 16, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/matrix_adjust.cu Outdated Show resolved Hide resolved

mzient reviewed Jul 16, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/matrix_adjust.cu Outdated Show resolved Hide resolved

mzient reviewed Jul 16, 2024

View reviewed changes

szalpal approved these changes Jul 16, 2024

View reviewed changes

Review fixes

3da870b

Signed-off-by: Rafal Banas <[email protected]>

mzient reviewed Jul 17, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/matrix_adjust.cu Outdated Show resolved Hide resolved

mzient reviewed Jul 17, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

mzient reviewed Jul 17, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

mzient reviewed Jul 17, 2024

View reviewed changes

mzient reviewed Jul 18, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/warp_perspective.cc Outdated Show resolved Hide resolved

mzient reviewed Jul 18, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/matrix_adjust.cu Outdated Show resolved Hide resolved

mzient reviewed Jul 18, 2024

View reviewed changes

dali/operators/image/remap/cvcuda/matrix_adjust.cu Outdated Show resolved Hide resolved

mzient reviewed Jul 18, 2024

View reviewed changes

Review fixes

c705d9c

Signed-off-by: Rafal Banas <[email protected]>

Typo fix

1fb51f7

Signed-off-by: Rafal Banas <[email protected]>

mzient approved these changes Jul 18, 2024

View reviewed changes

jantonguirao approved these changes Jul 18, 2024

View reviewed changes

banasraf merged commit a473467 into NVIDIA:main Jul 18, 2024
7 checks passed

		matrix = AcquireTensorArgument(ws, scratchpad, matrix_arg_, TensorShape<1>(9),
		nvcvop::GetDataType<float>(), "W");

		@@ -0,0 +1,372 @@
		# Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.


		using MatricesWrap = nvcv::cuda::TensorWrap<float, 9 * sizeof(float), sizeof(float)>;

		__global__ void adjustMatricesKernel(MatricesWrap wrap, int batch_size) {

		nvcv::Tensor AsTensor(const Tensor<GPUBackend> &tensor, TensorLayout layout = "",
		const std::optional<TensorShape<>> &reshape = {}) {

	"Transformation matrix data. Should be used to pass the GPU data. "
	"Transformation matrix data, on GPU memory. "

	void adjustMatrices(nvcv::Tensor &matrices, cudaStream_t stream);
	void adjustMatricesToPixelCenter(nvcv::Tensor &matrices, cudaStream_t stream);

Add warp_perspective operator #5542

Add warp_perspective operator #5542

Conversation

banasraf commented Jun 28, 2024

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Check failure

mzient Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

banasraf commented Jul 16, 2024

dali-automaton commented Jul 16, 2024

dali-automaton commented Jul 16, 2024

banasraf commented Jul 16, 2024

dali-automaton commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jul 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

banasraf commented Jul 17, 2024

dali-automaton commented Jul 17, 2024

Choose a reason for hiding this comment

banasraf Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

banasraf commented Jul 18, 2024

dali-automaton commented Jul 18, 2024

dali-automaton commented Jul 18, 2024

banasraf commented Jul 18, 2024

dali-automaton commented Jul 18, 2024

dali-automaton commented Jul 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jul 18, 2024

mzient Jul 2, 2024 •

edited

Loading

banasraf Jul 18, 2024 •

edited

Loading