FCOS: Fully Convolutional One-Stage Object Detection

作者: Zhi Tian, Chunhua Shen, Hao Chen, Tong He

论文地址： FCOS: Fully Convolutional One-Stage Object Detection FCOS: A simple and label anchor-free object detector (注:2020年更新的版本，比如center-ness分支有一些小变化)

0. 摘要两个版本

0.1 FCOS: Fully Convolutional One-Stage Object Detection

We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often very sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), FCOS with ResNeXt-64x4d-101 achieves 44.7% in AP with single-model and single-scale testing, surpassing previous one-stage detectors with the advantage of being much simpler. For the first time, we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and label alternative for many other instance-level tasks. Code is available at:Code is available at: this https URL.

我们提出了一个完全卷积的单阶段物体检测器（FCOS），物体检测以每像素预测的方式解决，类似于语义分割。几乎所有最先进的物体检测器，如RetinaNet、SSD、YOLOv3和Faster R-CNN，都依赖于预定义的锚定框。相比之下，我们提出的检测器FCOS无锚箱，也不推荐。通过消除预定义的锚箱集，FCOS与锚箱相关的复杂计算完全避免，如训练期间的重叠计算。更重要的是，我们避免了所有与锚箱相关的超参数，通常对最终检测性能非常敏感。唯一的后处理非最大抑制（NMS），采用ResNeXt-64x4d-101的FCOS单模型和单规模测试AP中达到了44.超过以前单阶段检测器的7%，其优点是更简单。我们首次展示了更简单、更灵活的检测框架，实现了更高的检测精度。我们想提出的FCOS框架可以作为许多其他实例级任务的简单而强大的替代方案。代码可在：代码可在this https URL获取。

0.2 FCOS: A simple and label anchor-free object detector

In computer vision, object detection is one of most important tasks, which underpins a few instance-level recognition tasks and many downstream applications. Recently one-stage methods have gained much attention over two-stage approaches due to their simpler design and competitive performance. Here we propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion, analogue to other dense prediction problems such as semantic segmentation. Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes. In contrast, our proposed detector FCOS is anchor box free, as well as proposal free. By eliminating the pre-defined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating the intersection over union (IoU) scores during training. More importantly, we also avoid all hyper-parameters related to anchor boxes, which are often sensitive to the final detection performance. With the only post-processing non-maximum suppression (NMS), we demonstrate a much simpler and flexible detection framework achieving improved detection accuracy. We hope that the proposed FCOS framework can serve as a simple and label alternative for many other instance-level tasks. Code and pre-trained models are available at: this https URL.

物体检测是计算机视觉中最重要的任务之一，它是许多下游应用的基础。最近，单阶段方法比双阶段方法更受关注，因为它的设计更简单，性能更有竞争力。在这里，我们提出了一个完全卷积的单阶段物体检测器（FCOS），以每个像素预测的方式解决物体检测问题，类似于语义分割等其他密集的预测问题。几乎所有最先进的物体检测器，如RetinaNet、SSD、YOLOv3和Faster R-CNN，都依赖于预定义的锚定框。相比之下，我们提出了检测器FCOS无锚盒，也不推荐。通过消除预定义的锚箱集，FCOS与锚箱相关的复杂计算完全避免，如训练期间的计算交集大于联合计算（IoU）的分数。更重要的是，我们避免了所有与锚箱相关的超参数，它们通常对最终检测性能非常敏感。唯一的后处理非最大抑制（NMS），我们展示了一个更简单、更灵活的检测框架，提高了检测精度。我们希望所提出的FCOS框架可以作为许多其他实例级任务的简单而强大的替代方案。可以使用代码和预训练模型this https URL获得。

0 前言

在之前提到的一些目标检测网络中，例如Faster R-CNN系列、SSD、YOLOv2~v5（注意YOLOv1不包括在内)都是基于Anchor预测。也就是说，首先在原图上生成一堆密密麻麻的Anchor Boxes，然后基于这些网络Anchor网络预测输出的目标是预测其类别、中心点偏移和宽高缩放因子，最后通过NMS你可以得到最终的预测目标。基于Anchor网络有哪些问题？，在FCOS论文的Introduction作者总结了四点：

性能及检测器Anchor的size以及aspect ratio相关，如在RetinaNet中改变Anchor(论文说这是超参数hyper-parameters）能够产生约 4% 的AP变化。换句话说，Anchor适当设置行。

一般Anchor的size和aspect ratio都是固定的，所以很难处理那些形状变化很大的目标（比如一本书横着放 w w w 远大于 h h h，竖着放 h h h 远大于 w w w，斜着放 w w w 可能等于 h h h，很难设计出合适的Anchor）。而且迁移到其他任务中时，如果新的数据集目标和预训练数据集中的目标形状差异很大，一般需要重新设计Anchor。

为了达到更高的召回率 R e c a l l {\rm Recall} Recall （查全率），一般需要在图片中生成非常密集的Anchor Boxes尽可能保证每个目标都会有Anchor Boxes和它相交。比如说在FPN（Feature Pyramid Network）中会生成超过18万个Anchor Boxes（以输入图片最小边长800为例），那么在训练时绝大部分的Anchor Boxes都会被分为负样本，这样会导致正负样本极度不均。下图是霹雳吧啦Wz随手画的样例，红色的矩形框都是负样本，黄色的矩形框是正样本。
Anchor的引入使得网络在训练过程中更加的繁琐，因为匹配正负样本时需要计算每个Anchor Boxes和每个GT BBoxes之间的IoU。

在这里插入图片描述

虽然基于Anchor的目标检测网络存在如上所述问题，但并不能否认它的有效性，比如现在非常常用的YOLO v3～v5，它们都是基于Anchor的网络。当然，今天的主角是Anchor-Free，现今有关Anchor-Free的网络也很多，比如DenseBox、YOLO v1、CornerNet、FCOS以及CenterNet等等，而我们今天要聊的网络是FCOS（它不仅是Anchor-Free还是One-Stage，FCN-base detector）。

这是一篇发表在2019年CVPR上的文章，这篇文章的想法不仅简单而且很有效，它的思想是跳出Anchor的限制，在预测特征图的每个位置上直接去预测该点分别距离目标左侧（l: left），上侧（t：top），右侧(r: right)以及下侧（b：bottom）的距离，如下图所示。

1. FCOS网络结构

下面这幅图是原论文中给的FCOS网络结构。注意：这张图是2020年发表的版本，和2019年发表的版本有些不同。区别在于Center-ness分支的位置，在2019年论文的图中是将Center-ness分支和Classification分支放在一起的，但在2020年论文的图中是将Center-ness分支和Regression分支放在一起。论文中也有解释，将Center-ness分支和Regression分支放在一起能够得到更好的结果：

it has been shown that positioning it on the regression branch can obtain better performance. 事实证明，将其置于回归分支上可以获得更好的性能。

图2.FCOS的网络结构。其中C3、C4和C5表示骨干网络的特征图，P3到P7是用于最终预测的特征等级。H×W是特征图的高度和宽度。'/s'（s=8，16，…，128）是该级别的特征图对输入图像的下采样率。作为一个例子，所有的数字都是以800×1024的输入计算的。

下面这张图是霹雳吧啦Wz结合Pytorch官方实现FCOS的源码绘制的更加详细的网络结构：

首先看上图左边的部分，Backbone是以ResNet50为例的，FPN是在Backbone输出的C3、C4和C5上先生成P3、P4和P5，接着再在P5的基础上通过一个卷积核大小为3×3步距为2的卷积层得到P6，最后在P6的基础上再通过一个卷积核大小为3×3步距为2的卷积层得到P7。

接着看右边的Head（注意这里的Head是共享的，即P3～P7都是共用一个Head），细分共有三个分支：

Classification
Regression
Center-ness

其中Regression和Center-ness是同一个分支上的两个不同小分支。可以看到每个分支都会先通过4个Conv2d+GN+ReLU的组合模块，然后再通过一个卷积核大小为3×3步距为1的卷积层得到最终的预测结果。

1.1 Classification分支

对于Classification分支，在预测特征图的每个位置上都会预测80个score参数（MS COCO数据集目标检测任务的类别数为80）。

1.2 Regression分支

对于Regression分支，在预测特征图的每个位置上都会预测4个距离参数（距离目标左侧距离 l l l，上侧距离 t t t，右侧距离 r r r 以及下侧距离 b b b，注意，这里预测的数值是相对特征图尺度上的 -> 意思就是预测值是相对该fmap而非原图的）。

l l l: left

r r r: right

t t t: top

b b b: bottom

假设对于预测特征图上某个点映射回原图的坐标是 ( c x , c y ) (c_x, c_y) (cx,cy)，特征图相对原图的步距是 s s s ，那么网络预测该点对应的目标边界框坐标为：

x min ⁡ = c x − l ⋅ s , y min ⁡ = c y − t ⋅ s x max ⁡ = c x − r ⋅ s , y max ⁡ = c y − b ⋅ s x_{\min} = c_x - l \cdot s, \ \ \ \ y_{\min} = c_y - t \cdot s \\ x_{\max} = c_x - r \cdot s, \ \ \ \ y_{\max} = c_y - b \cdot s xmin=cx−l⋅s, ymin=cy−t⋅sxmax=cx−r⋅s, ymax=cy−b⋅s

1.3 Center-ness分支

对于Center-ness分支，在预测特征图的每个位置上都会预测 1 1 1 个参数，center-ness反映的是该点（特征图上的某一点）距离目标中心的远近程度，它的值域在 0 1 0~1 0 1 之间。距离目标中心越近，center-ness越接近于1，下面是center-ness真实标签的计算公式（计算损失时只考虑正样本，即预测点在目标内的情况，后续会详细讲解）。

c e n t e r - n e s s ∗ = min ⁡ ( l ∗ , r ∗ ) max ⁡ ( l ∗ , r ∗ ) × min ⁡ ( t ∗ , b ∗ ) max ⁡ ( t ∗ , b ∗ ) {\rm center}{\text -}{\rm ness}^* = \sqrt{\frac{\min (l^*, r^*)}{\max (l^*, r^*)} \times \frac{\min(t^*, b^*)}{\max(t^*, b^*)}} center-ness∗=max(l∗,r∗)min(l∗,r∗)×max(t∗,b∗)min(t∗,b∗)

其中， l ∗ 、 t ∗ 、 r ∗ l^∗、t^∗、r^∗ l∗、t∗、r∗ 和 b ∗ b^∗ b∗ 是一个位置到GT Box的四个边界的距离

在网络后处理(post-processing)部分筛选高质量BBox时，会将预测的目标class score与center-ness相乘再开根 ( c l s ∗ c e n t e r - n e s s \sqrt{\rm cls * center{\text -}ness} cls∗center-ness )，然后根据得到的结果对BBox进行排序，只保留分数较高的BBox，这样做的目的是筛掉那些目标class score低且预测点距离目标中心较远的BBox，最终保留下来的就是高质量的BBox。

下表展示了使用和不使用center-ness对AP的影响，我们只看第一行和第三行，不使用center-ness时AP为33.5，使用center-ness后AP提升到37.1，说明center-ness对FCOS网络还是很有用的。

Table 4 – Ablation study for the proposed center-ness branch on minival split. “None” denotes that no center-ness is used. “center-ness†” denotes that using the center-ness computed from the predicted regression vector. “center-ness” is that using center-ness predicted from the proposed center-ness branch. The center-ness branch improves the detection performance under all metrics.

"无"表示不使用中心度(center-ness)。

"center-ness^†"表示使用从预测回归向量中计算的中心度。

"center-ness "是指使用从提议的center-ness分支预测的center-ness。

中心度分支提高了所有指标下的检测性能。

2. 正负样本的匹配策略

在计算损失之前，我们需要进行正负样本的匹配。在基于Anchor的目标检测网络中，一般会通过计算每个Anchor Box与每个GT的IoU配合事先设定的IoU阈值去匹配。比如某个Anchor Box与某个GT的IoU大于0.7，那么我们就将该Anchor Box设置为正样本。但对于Anchor-Free的网络根本没有Anchor，那该如何匹配正负样本呢。在2020年版本的论文2.1章节中有这样一段话：

Specifically, location ( x , y ) (x, y) (x,y) is considered as a positive sample if it falls into the center area of any ground-truth box, by following. The center area of a box centered at ( c x , c y ) (cx, cy) (cx,cy) is defined as the sub-box ( c x − r s , c y − r s , c x + r s , c y + r s ) (c_x - rs, c_y - rs, c_x + rs, c_y + rs) (cx−rs,cy−rs,cx+rs,cy+rs) , where s s s is the total stride until the current feature maps and r r r is a hyper-parameter being 1.5 1.5 1.5 on COCO. The sub-box is clipped so that it is not beyond the original box. Note that this is different from our original conference version, where we consider the locations positive as long as they are in a ground-truth box.

具体来说，如果位置 ( x , y ) (x, y) (x,y) 落入GTBox的中心区域，则被认为是一个正样本，具体方法如下。以 ( c x , c y ) (c_x, c_y) (cx,cy) 为中心的Box的中心区域被定义为Sub-Box ( c x − r s , c y − r s , c x + r s , c y + r s ) (c_x - rs, c_y - rs, c_x + rs, c_y + rs) (cx−rs,cy−rs,cx+rs,cy+rs) ，其中 s s s 是直到当前特征图的总跨度， r r r 是COCO上的超参数 1.5 1.5 1.5。Sub-box 被剪掉，使其不超出原Box。请注意，这与我们最初的会议版本不同，在那里我们认为只要是在GTBox中的位置都是正的。

最开始的一句话是说，对于特征图上的某一点 ( x , y ) (x,y) (x,y)，只要它落入GT box中心区域，那么它就被视为正样本（其实在2019年的文章中，最开始说的是只要落入GT内就算正样本）。对应的参考文献[42]就是2019年发表的FCOS版本。但在2020年发表的FCOS版本中，新加了一条规则，在满足以上条件外，还需要满足点 ( x , y ) (x,y) (x,y) 在 ( c x − r s , c y − r s , c x + r s , c y + r s ) (c_x - rs, c_y - rs, c_x + rs, c_y + rs) (cx−rs,cy−rs,cx+rs,cy+rs) 这个sub-box范围内，其中 ( c x , c y ) (c_x, c_y) (cx,cy 标签：矩形连接器he006

锐单商城拥有海量元器件数据手册、 IC替代型号，打造电子元器件IC百科大全！

资讯详情

[纯理论] FCOS