Skip to content

Latest commit

 

History

History
541 lines (289 loc) · 17.7 KB

[paper reading] CenterNet (Triplets).md

File metadata and controls

541 lines (289 loc) · 17.7 KB

[paper reading] CenterNet (Triplets)

topic motivation technique key element math use yourself
CenterNet
(triple)
[Problem to Solve](#Problem to Solve)
Idea
Intuition
[CenterNet Architecture](#CenterNet Architecture)
[Center Pooling](#Center Pooling)
[Cascade Corner Pooling](#Cascade Corner Pooling)
[Central Region Exploration](#Central Region Exploration)
Baseline:CornerNet
[Generating BBox](#Generating BBox)
Training
Inferencing
[Ablation Experiment](#Ablation Experiment)
[Error Analysis](#Error Analysis)
[Metric AP & AR & FD](#Metric AP & AR & FD)
[Small & Medium & Large](#Small & Medium & Large)
[Central Region](#Central Region)
[Loss Function](#Loss Function)
[To Solved Problem](#To Solved Problem)
[Better Representation](#Better Representation)
[Generator & Adaptive](#Generator & Adaptive)

Motivation

Problem to Solve

keypoint-based方法的弊端(这里主要指的是CornerNet):

由于缺少对于cropped regionadditional look,无法获得bounding box regionvisual pattern,会导致产生大量的incorrect bounding box

image-20201104202625913(复件)

① CornerNet 会产生很多的错误的bounding box

Idea

用一个keypoint triplettop-left corner & bottom-right corner & center)表示一个object

即在由top-left corner & bottom-right cornerencode边界信息的同时,通过引入center,使得模型可以explore每个predicted bounding box的visual patter(获得object的internal信息

具体的做法上,是将 visual patterns within object 转化成 keypoint detection

image-20201104202625913

② 检查Central Region可以找出正确的prediction

Intuition

该思路部分沿袭RoI Pooling的思想,通过efficient discrimination(Central Region),使得one-stage方法一定程度上具有了two-stage方法的resample能力

具体来说:如果predicted bounding boxground-truth box高IoU,则Center-Region中的Center KeyPoint也会被预测为相同的类别

Technique

CenterNet Architecture

image-20201105095400153

Components

  • [Center Pooling](#Center Pooling)
  • [Cascade Corner Pooling](#Cascade Corner Pooling)
  • [Central Region Exploration](#Central Region Exploration)

Improvement

  • AP Improvement

    small、medium、large object的AP均有提升绝大部分的提升来自small object

    原因:Center Informationincorrect bounding box越小,能在其Central Region检测到center keypoint的可能性越小

    滤除incorrect bounding box,相当于提升accurate location but lower scoresbounding boxconfidence

    image-20201105135615588 small object image-20201105135713341
medium & large object
  • AR Improvement

    原因:滤除incorrect bounding box,相当于提升accurate location but lower scoresbounding boxconfidence

Center Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

geometric center不一定带有recognizable visual pattern

Purpose

better detection of center keypoint!!!

具体来说,是为Central Region提供recognizable visual pattern,以感知proposal中心位置的信息,从而检测bounding box的正确性

Steps

image-20201105102937735

对于Center Pooling的输入feature map,在水平和垂直方向max summed response

  1. backbone输出feature map

    这里一定要为2个不同的feature map,即horizontal和vertical的Corner Pooling接收的feature map必须是不同的,否则相当于Center Pooling不起作用

  2. 在水平和垂直方向分别找到最大值

  3. 将其加到一起

![image-20201105123057169]([paper reading] CenterNet (Triplets).assets/image-20201105123057169.png)

Cascade Corner Pooling

Cascade Corner Pooling 和 Center Pooling 都可以通过结合不同方向的 Corner Pooling 实现

Why

cornerobject之外,缺少local appearance feature

Purpose

better detection of corners!!!

具体来说,是丰富top-left corner和bottom-right corner收集的信息,以同时感知boundary和internal信息

Steps

image-20201105122244431

在输入feature map的boundary和internal方向,去max summed response(双方向的pooling更稳定更鲁棒,能提高准确率和召回率)

  1. boundary方向上找boundary max
  2. boundary max的位置,向internal方向上找internal max
  3. 2个max加起来(加到corner的位置

![image-20201105123054029]([paper reading] CenterNet (Triplets).assets/image-20201105123054029.png)

Attention!

Cascade Corner Pooling在具体实现上并不是严格按照上述流程,而是额外采用了skip connection和1×1 Conv这些被证明有效的技术,使得Cascade Corner Pooling具有了一定的自适应能力

因为从Cascade Corner Pooling的原理来看,垂直非对齐的位置也可能比对齐位置的信息更有用

总的来说,Left Pooling和Top Pooling其实给出了Cascade Top Corner Pooling基准(e.g. 先在boundary方向找最大值,再在boundary max的位置在internal方向找最大值),而skip connection和Top Pooling之前的 1×1 Conv其实是在该基准上,使得Cascade Top Corner Pooling有了一些自适应性

因为我们知道这种方式是有效的,但不知道其如何才能最有效!!!这就交给网络自己去学习吧!

Central Region Exploration

Scale-Aware Central Region

  • 原因

    $\text{recall} \ vs. \text{precision}$

  • Central Region的选择

    对不同size的bounding box生成不同大小Central Region

    • small bounding box ==> large central region

      原因:small center region会导致small bounding boxlow recall

    • large bounding box ==> small central region

      原因:small center region会导致small bounding boxlow recall

    在实验中,使用2中Central Region:

    image-20201105101810575

    具体使用哪种,由bounding box的scale决定:

    • $< 150$:n = 3 (left)
    • $> 150$:n = 5 (right)

Exploration

  • center keypoint落到Central Region中
  • center keypointbounding box类别相同

Key Element

Baseline:CornerNet

Three outputs

  • heatmap

    • top-left corner
    • bottom-right corner

    每个heatmap都包括2个部分:

    1. 不同categorykeypoint的位置
    2. 每个keypointconfidence score
  • embedding

    corner进行分组

  • offset

    cornerheatmapremapinput image

Generate BBox

  1. top-left corner和bottom-right corner分别取top-100
  2. 根据embedding distancecorner进行分组(embedding distance < $Threshold$
  3. 计算bounding boxconfidence score(2个corner score的平均

Drawbacks

CornerNetFalse Discovery Rate(FD)很高(即:有大量的incorrect bounding box

AP & FD的含义,见 [Metric AP & AR & FD](#Metric AP & AR & FD)

Generating BBox

  1. 选取 top-kcenter keypoints

  2. center keypointremapinput image(使用offset

  3. bounding box中定义Central Region

  4. 保留符合要求bounding box

    • center keypoint落到Central Region中
    • center keypointbounding box类别相同
  5. 计算bounding boxscore

    top-left cornerbottom-right cornercenteraverage score

Training

Input & Output Size

  • input size:511×511
  • output size:128×128

Data Augmentation

同 CornerNet

Inferencing

Single-Scale Testing

原分辨率,将originalflipped输入网络

Multi-Scale Testing

以分辨率 $[0.6, 1.0, 1.2,1.5,1.8]$,将originalflipped输入网络

Steps

  1. 根据70Triplet确定70bounding box

    详见 [Generating BBox](#Generating BBox)

  2. flipped image再次flip,合并到原image

  3. Post-ProcessingSoft-NMS

  4. top-100bounding box

Ablation Experiment

image-20201105140219262

Incorrect Bounding Box Reduction

image-20201105140402990

Inference Speed

visual patterns exploration的cost很小

CenterNet某版本可以在精度和速度上同时超过CornerNet某版本

Center Pooling Ablation

  • 结论

    Center Pooling可以大幅度提高large objectAP

  • 原因

    • Center Pooling可以提取更丰富的internal visual patterns
    • larger object包含更多的internal visual pattern

image-20201105141036192

Cascade Corner Pooling Ablation

  • 结论

    • 由于large object丰富的internal visual patternsCascade Corner Pooling可以看到更多的object

    • 过于丰富的internal visual patterns影响其对boundary的敏感,导致inaccurate bounding box

      • 可以通过Center Pooling抑制错误的Bounding box

Central Region Exploration Ablation

  • 结论

    提升了整体的AP,其中小目标AP提升最大

  • 原因

    小目标center keypoint容易被located

Error Analysis

  1. Exploration of visual patterns依赖于center keypoint实现 ==> Center keypoint的丢失会导致CenterNet丢失bounding box的visual pattern

  2. Center keypoint还有很大的提升空间

Metric AP & AR & FD

AP:Average Precision Rate

是在所有category上,以10个Threshold(e.g. $0.5:0.05:0.95$)上计算

可以反映网络可以预测多少高质量的bounding box(一般IoU$\ge0.5$

是MS-COCO数据集最重要的metric

AR:Maximum Recall Rate

每张图片上取固定数量的detection,在所有类别10个IoU Threshold上取平均

FD:False Discovery Rate

反映incorrect bounding box的比例 $$ \text{FD} = 1-\text{AP} $$

Small & Medium & Large

  • small object:$\text{area}<32^2$

  • medium object:$32^2<\text{area}<96^2$

  • large object:$\text{area}>96^2$

Math

Central Region

image-20201105102049760

Loss Function

主要分为:

  • Detection Loss

    • Corner Detection Loss $\text{L}_{\text{det}}^{\text{co}}$
    • Center Detection Loss $\text{L}_{\text{det}}^{\text{ce}}$
  • Pull & Push Loss

    仅对Corner进行

    • Pull Loss $\text{L}_{\text{pull}}^{\text{co}}$
    • Push Loss $\text{L}_{\text{push}}^{\text{co}}$
  • Offset Loss

    • Corner offset Loss $\text{L}_{\text{off}}^{\text{co}}$
    • Center offset Loss $\text{L}_{\text{off}}^{\text{ce}}$

image-20201105130407319

  • $\alpha=\beta = 0.1$
  • $\gamma=1$

Use Yourself

To Solved Problem

CenterNet (Triplet) 中将visual patterns within object转化为keypoint detection的问题

相比于获取visual patterns within objectkeypoint detection是一个更容易解决的问题

这种将待解决问题转化为已解决问题的思路值得学习

Better Representation

面对相同的数据,使用相同的框架,对于数据representation的好坏,会直接影响模型的性能

如何产生更好的representation,是一个重要的问题。包括Faster-RCNN,SSD,RetinaNet(FPN)代表的anchor-based method(e.g. 从single-scale到multi-scale),以及CornerNet,CenterNet (Triplet) 的Cascade Corner Pooling代表的keypoint-based method,都是在这方向上的尝试

但是如何评价一个Representation的好坏呢?这似乎又回到了可解释性的问题,或者是我们直观的理解?

Generator & Adaptive

个人认为,神经网络具有一定的自适应性的原因是神经网络可以作为生成器

对于一些难以具体实现的方法,或者有基准但是最优方法难以确定的情况,可以借由神经网络生成器的作用来实现

这种思路最早出现于ResNet的shortcut connection

Related Work

Anchor-Based Method

Introduction

Anchor-Based Method有2个关键点:

  • 放置预定义size和ratioanchor
  • 根据ground-truthpositive bounding box进行regression

drawbacks

  • 需要大量的anchor(以保持和ground-truth box足够高的IoU

  • anchorsize和ratio需要手工设计(带来大量的超参数需要调试)

  • anchor和ground-truth没有对齐

KeyPoint-Based Method

这里主要指的是CornerNet

Introduction

即:使用一对corner表示一个object

drawbacks

  • referring到global信息能力相对较弱

    换句话说,即:对object的boundary信息敏感

  • 无法确知哪对KeyPoints应该表示object

详见 [Problem to Solve](#Problem to Solve)

Two-Stage Method

Steps

  • Extract RoIs ==> stage-1
  • classify & regress RoIs ==> stage-2

Models

RCNN

  • selective search获得RoI
  • CNN作为classifier

SPP-Net & Faster-RCNN

  • feature map中提取RoIs

Faster-RCNN

  • 使用RPNanchor进行regression,实现了end-to-end训练

Mask-RCNN

  • Faster-RCNN + mask-prediction branch
  • 同时实现detection和segmentation

R-FCN

  • FC层替换成了position-sensitive score maps

Cascade RCNN

通过训练一系列IoU阈值逐渐升高的detector,解决了2个问题:

  • 训练时的overfitting
  • 推断时的quality mismatch

One-stage Method

one-stage方法的通病:缺少cropped regionadditional look

Steps

直接anchor box进行classifyregress

Models

YOLOv1

  • image ==> S×S grid
  • 不使用anchor,直接去学习bounding box的size

YOLOv2

  • 重新使用了较多的anchor
  • 使用了新的bounding box regression方法

SSD

  • 使用不同convolutional stagefeature map进行classifyregress

DSSD

  • SSD + deconvolution ==> 结合low-level和high-level的feature

R-SSD

  • 对不同feature layer,进行pooling和deconvolution ==> 结合low-level和high-level的feature

RON

  • reverse connection
  • objectness prior

RefineDet

  • 对location和size进行2次refine,继承了one-stage和two-stage的优点

CornerNet

  • keypoint-based method
  • 用一对corner表示一个object

Problems

  • Cascade Corner Pooling的internal方向,怎么找boundary方向的最大值呢?

    网络的自适应性,详见 [Cascade Corner Pooling](#Cascade Corner Pooling)

  • AP和AR的含义到底是什么?

  • 为什么CornerNet去referring目标的global information的能力很弱?

    因为Corner Pooling仅仅负责 $\frac14$ 的空间

    • top-left corner pooling 负责第四象限
    • bottom-right corner pooling 负责第二象限