Skip to content

VisionVerse/Unaligned-RGBT-Semantic-Segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unaligned RGBT Semantic Segmentation

The affine transformation of a 2D image, denoted as $\mathbf{M}$, is a composite of a linear transformation $\mathbf{M_1}$ and a translation transformation $\mathbf{M_2}$. Linear transformations comprise rotation, reflection, shearing, and scaling and adhere to the closed principle of addition and multiplication. As a result, the combination of different linear transformations remains a linear transformation.

$$\mathbf{M_1}=\left[\begin{array}{ll} a & b \\ c & d \end{array}\right], ~~ \mathbf{M_2}=\left[\begin{array}{l} e \\ f \end{array}\right]. $$

The affine transformation matrix can be formalized as $$\mathcal{M}_{s\rightarrow t} = \left[\mathcal{M}_1;\mathcal{M}_2 \right] \in \mathbb{R}^{2 \times 3}$$. Specifically, we randomly select three non-collinear points $\mathcal{S}=\lbrace s_1, s_2, s_3 \rbrace$ in the image and apply a random deformation to them to obtain the transformed points $\mathcal{T}=\lbrace t_1, t_2, t_3 \rbrace$. As shown in Fig. 1, we set the deformation strength $k$ to adjust the affine transformation. The displacement range of each point $s_i = (x, y)^\top$ is randomly chosen from the interval $[-k, k]$. In U-MFNet datasets, $k$ is set to 20 pixels. Then the affine transformation process of $S \rightarrow T$ can be expressed as:
$$t_i = \mathbf{M_1} \cdot s_i + \mathbf{M_2}.$$ Given two point sets $(\mathcal{S}, \mathcal{T})$ before and after deformation, we can use the OpenCV library function to calculate the affine transformation matrix $\mathbf{M}$. Finally, we apply this $\mathbf{M}$ matrix to the input image to obtain the output image after the affine transformation.

image
Fig. 1 Thermal images with different degrees of deformation were obtained with different $k$. The higher the value of $k$, the stronger the affine deformation in the image.

To create our new unaligned RGB-T SS benchmark, i.e. U-MFNet dataset (Baidu Netdisk), we applied the aforementioned tools to deform the thermal images in the MFNet dataset, while keeping the RGB images unaltered.

Experimental Results on U-MFNet dataset

TABLE I. Quantitative comparisons (%) on the U-MFNet datasets.

image

image Fig. 2 Qualitative comparisons of our method and eight SOTA methods in daytime and nighttime on the testset of U-MFNet datasets.

About

Unaligned RGB-T Semantic Segmentation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published