[paper reading] CenterNet (Triplets)

topic	motivation	technique	key element	math	use yourself
CenterNet (triple)	[Problem to Solve](#Problem to Solve) Idea Intuition	[CenterNet Architecture](#CenterNet Architecture) [Center Pooling](#Center Pooling) [Cascade Corner Pooling](#Cascade Corner Pooling) [Central Region Exploration](#Central Region Exploration)	Baseline：CornerNet [Generating BBox](#Generating BBox) Training Inferencing [Ablation Experiment](#Ablation Experiment) [Error Analysis](#Error Analysis) [Metric AP & AR & FD](#Metric AP & AR & FD) [Small & Medium & Large](#Small & Medium & Large)	[Central Region](#Central Region) [Loss Function](#Loss Function)	[To Solved Problem](#To Solved Problem) [Better Representation](#Better Representation) [Generator & Adaptive](#Generator & Adaptive)

Motivation

Problem to Solve

keypoint-based方法的弊端（这里主要指的是CornerNet）：

由于缺少对于cropped region的additional look，无法获得bounding box region的visual pattern，会导致产生大量的incorrect bounding box

① CornerNet 会产生很多的错误的bounding box

Idea

用一个keypoint triplet（top-left corner & bottom-right corner & center）表示一个object。

即在由top-left corner & bottom-right corner去encode边界信息的同时，通过引入center，使得模型可以explore每个predicted bounding box的visual patter（获得object的internal信息）

在具体的做法上，是将 visual patterns within object 转化成 keypoint detection

② 检查Central Region可以找出正确的prediction

Intuition

该思路部分沿袭RoI Pooling的思想，通过efficient discrimination（Central Region），使得one-stage方法一定程度上具有了two-stage方法的resample能力

具体来说：如果predicted bounding box和ground-truth box有高IoU，则Center-Region中的Center KeyPoint也会被预测为相同的类别

Technique

CenterNet Architecture

Components

[Center Pooling](#Center Pooling)
[Cascade Corner Pooling](#Cascade Corner Pooling)
[Central Region Exploration](#Central Region Exploration)

Improvement

AP Improvement

small、medium、large object的AP均有提升，绝大部分的提升来自small object

原因：Center Information。incorrect bounding box越小，能在其Central Region检测到center keypoint的可能性越小

滤除了incorrect bounding box，相当于提升了accurate location but lower scores的bounding box的confidence
small object

medium & large object

AR Improvement

原因：滤除了incorrect bounding box，相当于提升了accurate location but lower scores的bounding box的confidence

Center Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

geometric center并不一定带有recognizable visual pattern

Purpose

better detection of center keypoint！！！

具体来说，是为Central Region提供recognizable visual pattern，以感知proposal中心位置的信息，从而检测bounding box的正确性

Steps

对于Center Pooling的输入feature map，在水平和垂直方向取max summed response

backbone输出feature map

这里一定要为2个不同的feature map，即horizontal和vertical的Corner Pooling接收的feature map必须是不同的，否则相当于Center Pooling不起作用
在水平和垂直方向分别找到最大值
将其加到一起

![image-20201105123057169]([paper reading] CenterNet (Triplets).assets/image-20201105123057169.png)

Cascade Corner Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

corner在object之外，缺少local appearance feature

Purpose

better detection of corners！！！

具体来说，是丰富top-left corner和bottom-right corner收集的信息，以同时感知boundary和internal信息

Steps

在输入feature map的boundary和internal方向，去max summed response（双方向的pooling更稳定更鲁棒，能提高准确率和召回率）

在boundary方向上找boundary max
在boundary max的位置，向internal方向上找internal max
把2个max加起来（加到corner的位置）

![image-20201105123054029]([paper reading] CenterNet (Triplets).assets/image-20201105123054029.png)

Attention！

Cascade Corner Pooling在具体实现上并不是严格按照上述流程，而是额外采用了skip connection和1×1 Conv这些被证明有效的技术，使得Cascade Corner Pooling具有了一定的自适应能力

因为从Cascade Corner Pooling的原理来看，垂直非对齐的位置也可能比对齐位置的信息更有用

总的来说，Left Pooling和Top Pooling其实给出了Cascade Top Corner Pooling的基准（e.g. 先在boundary方向找最大值，再在boundary max的位置在internal方向找最大值），而skip connection和Top Pooling之前的 1×1 Conv其实是在该基准上，使得Cascade Top Corner Pooling有了一些自适应性

因为我们知道这种方式是有效的，但不知道其如何才能最有效！！！这就交给网络自己去学习吧！

Central Region Exploration

Scale-Aware Central Region

原因：

$\text{recall} \ vs. \text{precision}$
Central Region的选择：

对不同size的bounding box生成不同大小Central Region
- small bounding box ==> large central region
  
  原因：small center region会导致small bounding box的low recall
- large bounding box ==> small central region
  
  原因：small center region会导致small bounding box的low recall
在实验中，使用2中Central Region：

具体使用哪种，由bounding box的scale决定：
- $< 150$：n = 3 (left)
- $> 150$：n = 5 (right)

Exploration

center keypoint落到Central Region中
center keypoint和bounding box的类别相同

Key Element

Baseline：CornerNet

Three outputs

heatmap：
- top-left corner
- bottom-right corner
每个heatmap都包括2个部分：
1. 不同category的keypoint的位置
2. 每个keypoint的confidence score
embedding：

对corner进行分组
offset：

把corner从heatmap去remap到input image

Generate BBox

对top-left corner和bottom-right corner分别取top-100
根据embedding distance对corner进行分组（embedding distance < $Threshold$）
计算bounding box的confidence score（2个corner score的平均）

Drawbacks

CornerNet的False Discovery Rate（FD）很高（即：有大量的incorrect bounding box）

AP & FD的含义，见 [Metric AP & AR & FD](#Metric AP & AR & FD)

Generating BBox

选取 top-k 个center keypoints
center keypoint去remap到input image（使用offset）
在bounding box中定义Central Region
保留符合要求的bounding box
- center keypoint落到Central Region中
- center keypoint和bounding box的类别相同
计算bounding box的score

为top-left corner、bottom-right corner、center的average score

Training

Input & Output Size

input size：511×511
output size：128×128

Data Augmentation

同 CornerNet

Inferencing

Single-Scale Testing

以原分辨率，将original和flipped输入网络

Multi-Scale Testing

以分辨率 $[0.6, 1.0, 1.2,1.5,1.8]$，将original和flipped输入网络

Steps

根据70对Triplet确定70对bounding box

详见 [Generating BBox](#Generating BBox)
将flipped image再次flip，合并到原image上
Post-Processing：Soft-NMS
取top-100的bounding box

Ablation Experiment

Incorrect Bounding Box Reduction

Inference Speed

visual patterns exploration的cost很小

CenterNet某版本可以在精度和速度上同时超过CornerNet某版本

Center Pooling Ablation

结论：

Center Pooling可以大幅度提高large object的AP
原因：
- Center Pooling可以提取更丰富的internal visual patterns
- larger object包含更多的internal visual pattern

Cascade Corner Pooling Ablation

结论：
- 由于large object有丰富的internal visual patterns，Cascade Corner Pooling可以看到更多的object
- 过于丰富的internal visual patterns会影响其对boundary的敏感，导致inaccurate bounding box
  - 可以通过Center Pooling抑制错误的Bounding box

Central Region Exploration Ablation

结论：

提升了整体的AP，其中小目标AP提升最大
原因：

小目标的center keypoint更容易被located

Error Analysis

Exploration of visual patterns依赖于center keypoint实现 ==> Center keypoint的丢失会导致CenterNet丢失bounding box的visual pattern
Center keypoint还有很大的提升空间

Metric AP & AR & FD

AP：Average Precision Rate

是在所有category上，以10个Threshold（e.g. $0.5:0.05:0.95$）上计算

可以反映网络可以预测多少高质量的bounding box（一般IoU$\ge0.5$）

是MS-COCO数据集最重要的metric

AR：Maximum Recall Rate

在每张图片上取固定数量的detection，在所有类别和10个IoU Threshold上取平均

FD：False Discovery Rate

反映incorrect bounding box的比例 $$ \text{FD} = 1-\text{AP} $$

Small & Medium & Large

small object：$\text{area}<32^2$
medium object：$32^2<\text{area}<96^2$
large object：$\text{area}>96^2$

Math

Central Region

Loss Function

主要分为：

Detection Loss
- Corner Detection Loss $\text{L}_{\text{det}}^{\text{co}}$
- Center Detection Loss $\text{L}_{\text{det}}^{\text{ce}}$
Pull & Push Loss

仅对Corner进行
- Pull Loss $\text{L}_{\text{pull}}^{\text{co}}$
- Push Loss $\text{L}_{\text{push}}^{\text{co}}$
Offset Loss
- Corner offset Loss $\text{L}_{\text{off}}^{\text{co}}$
- Center offset Loss $\text{L}_{\text{off}}^{\text{ce}}$

$\alpha=\beta = 0.1$
$\gamma=1$

Use Yourself

To Solved Problem

CenterNet (Triplet) 中将visual patterns within object转化为keypoint detection的问题

相比于获取visual patterns within object，keypoint detection是一个更容易解决的问题

这种将待解决问题转化为已解决问题的思路值得学习

Better Representation

面对相同的数据，使用相同的框架，对于数据representation的好坏，会直接影响模型的性能

如何产生更好的representation，是一个重要的问题。包括Faster-RCNN，SSD，RetinaNet（FPN）代表的anchor-based method（e.g. 从single-scale到multi-scale），以及CornerNet，CenterNet (Triplet) 的Cascade Corner Pooling代表的keypoint-based method，都是在这方向上的尝试

但是如何评价一个Representation的好坏呢？这似乎又回到了可解释性的问题，或者是我们直观的理解？

Generator & Adaptive

个人认为，神经网络具有一定的自适应性的原因是神经网络可以作为生成器

对于一些难以具体实现的方法，或者有基准但是最优方法难以确定的情况，可以借由神经网络生成器的作用来实现

这种思路最早出现于ResNet的shortcut connection

Related Work

Anchor-Based Method

Introduction

Anchor-Based Method有2个关键点：

放置预定义size和ratio的anchor
根据ground-truth对positive bounding box进行regression

drawbacks

需要大量的anchor（以保持和ground-truth box的足够高的IoU）
anchor的size和ratio需要手工设计（带来大量的超参数需要调试）
anchor和ground-truth没有对齐

KeyPoint-Based Method

这里主要指的是CornerNet

Introduction

即：使用一对corner表示一个object

drawbacks

referring到global信息的能力相对较弱

换句话说，即：对object的boundary信息敏感
无法确知哪对KeyPoints应该表示object

详见 [Problem to Solve](#Problem to Solve)

Two-Stage Method

Steps

Extract RoIs ==> stage-1
classify & regress RoIs ==> stage-2

Models

RCNN：

selective search获得RoI
CNN作为classifier

SPP-Net & Faster-RCNN：

在feature map中提取RoIs

Faster-RCNN：

使用RPN对anchor进行regression，实现了end-to-end训练

Mask-RCNN：

Faster-RCNN + mask-prediction branch
同时实现detection和segmentation

R-FCN：

将FC层替换成了position-sensitive score maps

Cascade RCNN：

通过训练一系列IoU阈值逐渐升高的detector，解决了2个问题：

训练时的overfitting
推断时的quality mismatch

One-stage Method

one-stage方法的通病：缺少对cropped region的additional look

Steps

直接对anchor box进行classify和regress

Models

YOLOv1：

image ==> S×S grid
不使用anchor，直接去学习bounding box的size

YOLOv2：

重新使用了较多的anchor
使用了新的bounding box regression方法

SSD：

使用不同convolutional stage的feature map进行classify和regress

DSSD：

SSD + deconvolution ==> 结合low-level和high-level的feature

R-SSD：

对不同feature layer，进行pooling和deconvolution ==> 结合low-level和high-level的feature

RON：

reverse connection
objectness prior

RefineDet：

对location和size进行2次refine，继承了one-stage和two-stage的优点

CornerNet：

keypoint-based method
用一对corner表示一个object

Problems

Cascade Corner Pooling的internal方向，怎么找boundary方向的最大值呢？

网络的自适应性，详见 [Cascade Corner Pooling](#Cascade Corner Pooling)
AP和AR的含义到底是什么？
为什么CornerNet去referring目标的global information的能力很弱？

因为Corner Pooling仅仅负责 $\frac14$ 的空间
- top-left corner pooling 负责第四象限
- bottom-right corner pooling 负责第二象限

Files

[paper reading] CenterNet (Triplets).md

Latest commit

History

[paper reading] CenterNet (Triplets).md

File metadata and controls

[paper reading] CenterNet (Triplets)

Motivation

Problem to Solve

Idea

Intuition

Technique

CenterNet Architecture

Components

Improvement

Center Pooling

Why

Purpose

Steps

Cascade Corner Pooling

Why

Purpose

Steps

Attention！

Central Region Exploration

Scale-Aware Central Region

Exploration

Key Element

Baseline：CornerNet

Three outputs

Generate BBox

Drawbacks

Generating BBox

Training

Input & Output Size

Data Augmentation

Inferencing

Single-Scale Testing

Multi-Scale Testing

Steps

Ablation Experiment

Incorrect Bounding Box Reduction

Inference Speed

Center Pooling Ablation

Cascade Corner Pooling Ablation

Central Region Exploration Ablation

Error Analysis

Metric AP & AR & FD

AP：Average Precision Rate

AR：Maximum Recall Rate

FD：False Discovery Rate

Small & Medium & Large

Math

Central Region

Loss Function

Use Yourself

To Solved Problem

Better Representation

Generator & Adaptive

Related Work

Anchor-Based Method

Introduction

drawbacks

KeyPoint-Based Method

Introduction

drawbacks

Two-Stage Method

Steps

Models

One-stage Method

Steps

Models

Problems