以下是关于我个人的libfacedetection(人脸检测-pytorch)如有错误,欢迎在评论区指出,我会尽快纠正。据说人脸检测速度可以达到1万FPS,我们来看看结果。
下一章是分析yunet网络结构的组成。在这里,我们可以看到网络的组成有四个部分,主网络、头网络、anchor计算框架,关键点(5)loss和bbox的loss。

主干网络
从网络架构的角度来看,基本模型是conv,batchnorm,relu堆叠,采样提取特征图。
class Yunet(nn.Module): def __init__(self, cfg_layers, activation_type='relu'): super().__init__() self.model0 = Conv_head(*cfg_layers[0], activation_type=activation_type) for i in range(1, len(cfg_layers)): self.add_module(f'model{i}', Conv4layerBlock(*cfg_layers[i], activation_type=activation_type)) # self.model1 = Conv4layerBlock(16, 64, activation_type=activation_type) # self.model2 = Conv4layerBlock(64, 64, activation_type=activation_type) # self.model3 = Conv4layerBlock(64, 64, activation_type=activation_type) # self.model4 = Conv4layerBlock(64, 64, activation_type=activation_type) # self.model5 = Conv4layerBlock(64, 64, activation_type=activation_type) # self.model6 = Conv4layerBlock(64, 64, activation_type=activation_type) self.init_weights() def init_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): if m.bias is not None: nn.init.xavier_normal_(m.weight.data) m.bias.data.fill_(0.02) else: m.weight.data.normal_(0, 0.01) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() def forward(self, x): x = self.model0(x) x = F.max_pool2d(x, 2) x = self.model1(x) x = self.model2(x) x = F.max_pool2d(x, 2) p1 = self.model3(x) x = F.max_pool2d(p1, 2) p2 = self.model4(x) x = F.max_pool2d(p2, 2) p3 = self.model5(x) x = F.max_pool2d(p3, 2) p4 = self.model6(x) return [p1, p2, p3, p4] 一系列卷积后得到的特征大小为10*10、通道数为64。
头部网络
头部网络原作者采用Yuhead,但我在这里用的是Yuhead_PAN,PAN其作用是对小目标的特征更加明确,PAN与FPN区别在于,PAN在FPN然后添加一个自底向上的下采样,从而从低维向高维传递语义信息,使小目标更加清晰。
PAN结构:
在这里,我们可以看到四层语义信息。
def forward(self, x): self.img_size = x.shape[-2:] feats = self.backbone(x) outs = self.head(feats) head_data=[(x.permute(0, 2, 3, 1).contiguous()) for x in outs] head_data = torch.cat([o.view(o.size(0), -1) for o in head_data], dim=1) head_data = head_data.view(head_data.size(0), -1, self.out_factor) loc_data = head_data[:, :, 0 : 4 self.num_landmarks * 2] conf_data = head_data[:, :, -self.num_classes - 1 : -1] iou_data = head_data[:,:, -1:] output = (loc_data, conf_data, iou_data) return output 头后得出三个值,loc_data(20,23500,14)、conf_data(20,23500,2)、iou_data(2,23500,1)。 conf 形状:torch.size(batch_size,num_priors,num_classes) loc 形状:torch.size(batch_size,num_priors,14) loc中的14:人脸bbox(4个值) 人脸关键点(10值)= 14
anchor框的计算
主要代码如下:
self.anchor_generator = PriorBox( min_sizes=cfg['model']['anchor']['min_sizes'], steps=cfg['model']['anchor']['steps'], clip=cfg['model']['anchor']['clip'], ratio=cfg['model']['anchor']['ratio'] ) class PriorBox(object): def __init__(self, min_sizes, steps, clip, ratio): super(PriorBox, self).__init__() self.min_sizes = min_sizes self.steps = steps self.clip = clip self.ratio = ratio def __call__(self, image_size): feature_map_2th = [int(int((image_size[0] 1) / 2) / 2), int(int((image_size[1] 1) / 2) / 2)] feature_map_3th = [int(feature_map_2th[0] / 2), int(feature_map_2th[1] / 2)] feature_map_4th = [int(feature_map_3th[0] / 2), int(feature_map_3th[1] / 2)] feature_map_5th = [int(feature_map_4th[0] / 2), int(feature_map_4th[1] / 2)] feature_map_6th = [int(feature_map_5th[0] / 2), int(feature_map_5th[1] / 2)] feature_maps = [feature_map_3th, feature_map_4th, feature_map_5th, feature_map_6th]
anchors = []
for k, f in enumerate(feature_maps):
min_sizes = self.min_sizes[k]
for i, j in product(range(f[0]), range(f[1])):
for min_size in min_sizes:
cx = (j + 0.5) * self.steps[k] / image_size[1]
cy = (i + 0.5) * self.steps[k] / image_size[0]
for r in self.ratio:
s_ky = min_size / image_size[0]
s_kx = r * min_size / image_size[1]
anchors += [cx, cy, s_kx, s_ky]
# back to torch land
output = torch.Tensor(anchors).view(-1, 4)
if self.clip:
output.clamp_(max=1, min=0)
return output
我们可以看到config文件中的anchor设置参数min_sizes,数字从小到大,表示用前面的特征图检测小目标,用后面的特征图检测大目标。
LOSS计算
这里主要讲的是人脸bbox的loss、关键点loss和正负样例的分类loss。
bbox loss: 人脸bbox采用的是smoothl1。得到的先验框与真实框进行loss计算。
loss_bbox_eiou = eiou_loss(loc_p[:, 0:4], loc_t[:, 0:4], variance=self.variance, smooth_point=self.smooth_point, reduction='sum')
关键点loss: 与人脸bbox计算相同。
loss_lm_smoothl1 = F.smooth_l1_loss(loc_p[:, 4:loc_dim], loc_t[:, 4:loc_dim], reduction='sum')
分类交叉熵损失: 采用softmax为损失,但是会筛选样本,正负样本为1:3
loss_cls_ce = F.cross_entropy(conf_p, targets_weighted, reduction='sum')