BackGround
Progress
PointNet
主要问题
PointNet 结构
Baseline
Hierarchical point set feature learning(Encoder)
Decoder
Loss
Experiments
自测表现
Reference
BackGround
PointNet点云数据的处理是一项创新工作。PointNet一些问题也暴露在使用中。PointNet由于不捕捉空间的局部特征,限制了PointNet细粒度识别能力。因此,提出PointNet 对PointNet改进缺点。
对于PointNet,详情见:PointNet(Analysis & Coding)
Progress
- 使用空间距离PointNet递归数据的局部点云的特征提取,有助于在不断增加的上下文尺度下学习局部特征;
- 针对点云密度不同的问题,提出了自适应多尺度特征的新学习层PointNet 鲁棒学习点的特点可以更高效。
- 针对Segnmention问题,提出了使用反向插值和跳过连接的方法,帮助学习到了点云的逐点特征。
PointNet
主要问题
- 解决PointNet由度量引起的局部特征中不捕获。PointNet 捕捉多尺度对点云的特征;
- 因为PointNet中使用了Max Pooling获得整体特征,因此对特征的损失很大。本文借鉴了多层卷积神经网络的层次化特征PointNet 采用多层次特征学习;
- 如何划分点集,以提取更细粒的局部特征;
- 在Segmentation介绍了任务Encoder-Decoder的思想;
PointNet 结构
Baseline
Hierarchical point set feature learning(Encoder)
Hierarchical point set feature learning主要由多个Set Abstraction构成。目的是使用递归PointNet实现多层次测试学习。同时也保证了旋转不变性和置换不变性。Baseline如下图所示:
Set Abstraction它由三个部分组成,即Sampling,Grouping和PointNet三部分。也就是SetAbstraction = Sampling Grouping PointNet。接下来将分不分介绍这三个部分
在Sampling一些作者分别提出了两种解决方案Uniform Sampling和Farthest Sampling这两种。但是相比Uniform Sampling,Farthest Sampling整个采样空间可以更好的覆盖。对于FPS(Farthest Point Sampling)这是通过这张图来解释的。
五角星的图形。建设第一次选取的是1号点,通过对整个五角星点云的筛查,发现2点是距离1点最远的点,那么2点作为FPS第二次采样点。然后将1点和2点作为一个集合,找到距离1点和2点最远的点,然后发现3点是距离1点和2点最远的点,所以3点是第三次FPS采样的点。以此类推。最后,在右图中找到了12345个点。这五点充分代表了五角星的特征,最大限度地覆盖了整个采样空间。
这是在代码中实现的FPS操作(注:这里使用的所有代码示例PointNet 改写过的Pytorch我也看过版本TensorFlow但是TF我觉得版本比较晦涩,不如Pytorch版本清晰,所以这里使用代码Pytorch讲解)
#FPS def farthest_point_sample(xyz, npoint): """ Input: xyz: pointcloud data, [B, N, 3] npoint: number of samples Return: centroids: sampled pointcloud index, [B, npoint] """ device = xyz.device B, N, C = xyz.shape centroids = torch.zeros(B, npoint, dtype=torch.long).to(device) # 大小为B,npoint(选点数量)0阵[B, npoint] distance = torch.ones(B, N).to(device) * 1e10 # 大小为B,N浮点数的1阵[B, N] farthest = torch.randint(0, N, (B,), dtype=torch.long).to(device) # 创建一个大小为B的随机整数阵列[B, 1] 第一点是开始 batch_indices = torch.arange(B, dtype=torch.long).to(device) # 创建一个大小为B的整数阵[B, 1] for i in range(npoint): centroids[:, i] = farthest #更新farthest第一次是随机点,是当前最远点 centroid = xyz[batch_indices, farthest, :].view(B, 1, 3) # 取出farthest点的坐标 dist = torch.sum((xyz - centroid) ** 2, -1) # 计算点到点的距离(以前去除的最远点) mask = dist < distance # 若距离小于以前的最远点,更新最远点 返回rue或false,true更新,false不更新
distance[mask] = dist[mask] # 更新最远点的距离 将距离更新到distance中
farthest = torch.max(distance, -1)[1] # 更新最远点的索引
return centroids # 返回最远点的索引
在Grouping部分,作者的提出了两种解决思路,一种是KNN(K Nearest Neighbors)的方法,另一种是Ball Query的方法。相比KNN的方法,BQ方法的局部领域可保证固定的区域尺寸,使得局部区域特征在整个特征空间中更具有通用性(其实KNN和BQ的思路差不多)。作者在这里使用了BQ的方法。在这里,BQ过程的球心为FPS所选出的点。作者在补充材料中加入了使用KNN和BQ方法在分类方面的对比
在代码中,因为要求解点和点之间的距离,所以首先对输入的两个点的左边求解其距离,具体代码如下:
#计算点到点的距离(欧氏距离)
#src为N,dst为M,输出为N,M
def square_distance(src, dst):
"""
Calculate Euclid distance between each two points.
src^T * dst = xn * xm + yn * ym + zn * zm;
sum(src^2, dim=-1) = xn*xn + yn*yn + zn*zn;
sum(dst^2, dim=-1) = xm*xm + ym*ym + zm*zm;
dist = (xn - xm)^2 + (yn - ym)^2 + (zn - zm)^2
= sum(src**2,dim=-1)+sum(dst**2,dim=-1)-2*src^T*dst dist=(x-y)^2=x^2+y^2-2*x*y x,y两个点云
Input:
src: source points, [B, N, C]
dst: target points, [B, M, C]
Output:
dist: per-point square distance, [B, N, M]
"""
B, N, _ = src.shape
_, M, _ = dst.shape
dist = -2 * torch.matmul(src, dst.permute(0, 2, 1)) # 计算点到点的距离 交换行列方便乘法
dist += torch.sum(src ** 2, -1).view(B, N, 1) # 加上点到点的距离
dist += torch.sum(dst ** 2, -1).view(B, 1, M) # 加上点到点的距离
return dist
在可以求解了两个点的坐标之后,开始使用BQ的方法根据FPS选组的中心点求解,具体代码如下:
#返回制定求半径之内的点集合 radius为半径,nsample为要找点的个数,xyz为输入,new_xnew_x为输出
def query_ball_point(radius, nsample, xyz, new_xyz):
"""
Input:
radius: local region radius
nsample: max sample number in local region
xyz: all points, [B, N, 3]
new_xyz: query points, [B, S, 3]
Return:
group_idx: grouped points index, [B, S, nsample]
"""
device = xyz.device
B, N, C = xyz.shape
_, S, _ = new_xyz.shape
group_idx = torch.arange(N, dtype=torch.long).to(device).view(1, 1, N).repeat([B, S, 1]) # 创建一个大小为B,S,N的整数阵[B, S, N]
sqrdists = square_distance(new_xyz, xyz) #计算点到点的距离
group_idx[sqrdists > radius ** 2] = N #如果大于半径平方,则设置为N(我们目标要找小于的)
group_idx = group_idx.sort(dim=-1)[0][:, :, :nsample] # 将点到点的距离排序(升序),取前nsample个点
group_first = group_idx[:, :, 0].view(B, S, 1).repeat([1, 1, nsample]) # 创建一个大小为B,S,nsample的整数阵[B, S, nsample],如果有值为N的点,那么舍去。用第一个点来代替
mask = group_idx == N #将这些点替换为第一个点
group_idx[mask] = group_first[mask]
return group_idx
作者在代码中将Sampling和Grouping两个过程和在了一起也就是FPS+BQ的过程,如下图所示:
但是在代码中有两个部分,分别是将输入的数据点划分成多个的Group的操作(Sampling_and_group())和将输入的点划分为单个Group的操作(Sampling_and_group_all()).具体实现的代码过程如下:
#sample+Group(多个group)
def sample_and_group(npoint, radius, nsample, xyz, points, returnfps=False):
"""
Input:
npoint:
radius:
nsample:
xyz: input points position data, [B, N, 3]
points: input points data, [B, N, D]
Return:
new_xyz: sampled points position data, [B, npoint, nsample, 3]
new_points: sampled points data, [B, npoint, nsample, 3+D]
"""
B, N, C = xyz.shape
S = npoint
fps_idx = farthest_point_sample(xyz, npoint) #FPS [B, npoint, C]
new_xyz = index_points(xyz, fps_idx) #中心点的set
idx = query_ball_point(radius, nsample, xyz, new_xyz) #求半径内的点集合
grouped_xyz = index_points(xyz, idx) #找到找出的点[B, npoint, nsample, C]
grouped_xyz_norm = grouped_xyz - new_xyz.view(B, S, 1, C) #转化为相对坐标
if points is not None:
grouped_points = index_points(points, idx) #如果有点,拼接 C+D
new_points = torch.cat([grouped_xyz_norm, grouped_points], dim=-1) #拼接 C+D [B, npoint, nsample, C+D]
else:
new_points = grouped_xyz_norm #如果没有点,返回相对坐标 [B, npoint, nsample, C]
if returnfps: #是否返回FPS
return new_xyz, new_points, grouped_xyz, fps_idx
else:
return new_xyz, new_points
#sample+Group+Feature(只有一个group)
def sample_and_group_all(xyz, points):
"""
Input:
xyz: input points position data, [B, N, 3]
points: input points data, [B, N, D]
Return:
new_xyz: sampled points position data, [B, 1, 3]
new_points: sampled points data, [B, 1, N, 3+D]
"""
device = xyz.device
B, N, C = xyz.shape
new_xyz = torch.zeros(B, 1, C).to(device)
grouped_xyz = xyz.view(B, 1, N, C)
if points is not None:
new_points = torch.cat([grouped_xyz, points.view(B, 1, N, -1)], dim=-1)
else:
new_points = grouped_xyz
return new_xyz, new_points
在理想情况下,点云的数据应该是远近分布均匀的。但是一般的传感器采集回来的点云数据中,大部分的数据都是非均匀的。造成这个问题的主要原因是采集过程中,传感器的分辨率是固定的,扫描速度是固定的。所以随着距离(R)的变远,点云的密度也会降低。如下图所示的扫描过程中,1,2,3点的点云密度要大于4,5,6点的点云密度(他们都是由一个扫描器扫描的,但是R不同)
为了解决这个问题,作者提出了将局部特征和全局特征凭的方法。在这里,作者具体提出了两个方法,以一个是MSG(Multi-scale grouping),另一个是MRG(Multi-resolution grouping)。MSG就是通过对一个数据中,将数据采用不同的尺度进行采样,最终并将他们并列拼接在一起。MRG的方法是对上一层Set Abstraction中进行特征提取,之后再对本层各个局部领域进行特征根提取,之后在做并列拼接。具体示意如下图所示。
作者在文中对多尺度(MSG和MRG)的方法和单一尺度(SSG)最了对比,得出多尺度确实要比单一尺度结果要好(但是又DP的加入,不确定是哪个造成了具体影响),具体效果如下图所示:
作者将MSG的特征拼接的方法使用在了PointNet的过程中,提升了对局部特征提取的能力。在代码中,分别用使用了MSG和没有使用MSG的部分。具体的实现代码如下所示(作者在代码中并没有MRG部分,不知道为什么~~~~):
#SetAbsolutePosition = sample + group + PointNet
class PointNetSetAbstraction(nn.Module):
def __init__(self, npoint, radius, nsample, in_channel, mlp, group_all):#npoint:中心点的数量,radius:半径,nsample:每个区域点的数量,in_channel:输入的点的通道,mlp:mlp的维度,group_all:是否有group
super(PointNetSetAbstraction, self).__init__()
self.npoint = npoint
self.radius = radius
self.nsample = nsample
self.mlp_convs = nn.ModuleList()
self.mlp_bns = nn.ModuleList()
last_channel = in_channel
for out_channel in mlp: #mlp:对mlp中的设置循环处理
self.mlp_convs.append(nn.Conv2d(last_channel, out_channel, 1))
self.mlp_bns.append(nn.BatchNorm2d(out_channel))
last_channel = out_channel
self.group_all = group_all
def forward(self, xyz, points):
"""
Input:
xyz: input points position data, [B, C, N]
points: input points data, [B, D, N]
Return:
new_xyz: sampled points position data, [B, C, S]
new_points_concat: sample points feature data, [B, D', S]
"""
xyz = xyz.permute(0, 2, 1)
if points is not None: #有点
points = points.permute(0, 2, 1)
if self.group_all: #只形成一个group
new_xyz, new_points = sample_and_group_all(xyz, points)
else: #多个group
new_xyz, new_points = sample_and_group(self.npoint, self.radius, self.nsample, xyz, points)
# new_xyz: sampled points position data, [B, npoint, C]
# new_points: sampled points data, [B, npoint, nsample, C+D]
new_points = new_points.permute(0, 3, 2, 1) # [B, C+D, nsample,npoint] #交换维度
for i, conv in enumerate(self.mlp_convs): #利用group中的每个点做MLP(PointNet)
bn = self.mlp_bns[i]
new_points = F.relu(bn(conv(new_points)))
new_points = torch.max(new_points, 2)[0] #得到全局特征
new_xyz = new_xyz.permute(0, 2, 1)
return new_xyz, new_points
#SetAbsolutePosition MSG
class PointNetSetAbstractionMsg(nn.Module):
def __init__(self, npoint, radius_list, nsample_list, in_channel, mlp_list):
super(PointNetSetAbstractionMsg, self).__init__()
self.npoint = npoint
self.radius_list = radius_list #这里可以输入多个半径,用于提取不同尺度的特征
self.nsample_list = nsample_list
self.conv_blocks = nn.ModuleList()
self.bn_blocks = nn.ModuleList()
for i in range(len(mlp_list)):
convs = nn.ModuleList()
bns = nn.ModuleList()
last_channel = in_channel + 3
for out_channel in mlp_list[i]: #循环实现MLP
convs.append(nn.Conv2d(last_channel, out_channel, 1))
bns.append(nn.BatchNorm2d(out_channel))
last_channel = out_channel
self.conv_blocks.append(convs)
self.bn_blocks.append(bns)
def forward(self, xyz, points):
"""
Input:
xyz: input points position data, [B, C, N]
points: input points data, [B, D, N]
Return:
new_xyz: sampled points position data, [B, C, S]
new_points_concat: sample points feature data, [B, D', S]
"""
xyz = xyz.permute(0, 2, 1)
if points is not None:
points = points.permute(0, 2, 1)
B, N, C = xyz.shape
S = self.npoint
new_xyz = index_points(xyz, farthest_point_sample(xyz, S)) #输入FPS选的点的编号
new_points_list = []
for i, radius in enumerate(self.radius_list): #对不用半径做循环,用于提取不同尺度的特征
K = self.nsample_list[i]
group_idx = query_ball_point(radius, K, xyz, new_xyz) #quer_ball选取点
grouped_xyz = index_points(xyz, group_idx) #编号
grouped_xyz -= new_xyz.view(B, S, 1, C)
if points is not None: #如果有点
grouped_points = index_points(points, group_idx)
grouped_points = torch.cat([grouped_points, grouped_xyz], dim=-1) #拼接点的特征数据和坐标
else:
grouped_points = grouped_xyz
grouped_points = grouped_points.permute(0, 3, 2, 1) # [B, D, K, S]
for j in range(len(self.conv_blocks[i])): #对没个局部的区域(当前设置的尺度)求全局特征(PointNEt)
conv = self.conv_blocks[i][j] #cov
bn = self.bn_blocks[i][j] #bn
grouped_points = F.relu(bn(conv(grouped_points))) #relu(cov)
new_points = torch.max(grouped_points, 2)[0] # [B, D', S]
new_points_list.append(new_points)
new_xyz = new_xyz.permute(0, 2, 1) #sampled points position data, [B, C, S]
new_points_concat = torch.cat(new_points_list, dim=1) #sample points feature data, [B, D', S] 拼接不同半径的点云特征
return new_xyz, new_points_concat
在分类任务中,作者将Encoder中的输出输入到Decoder的任务中,之后经过了全连接。注意,在这个过程中作者加入了DroupOut(0.4),目的是为了防止过拟合。,具体实现如下:
class get_model(nn.Module):
def __init__(self,num_class,normal_channel=True):
super(get_model, self).__init__()
in_channel = 3 if normal_channel else 0
self.normal_channel = normal_channel
self.sa1 = PointNetSetAbstractionMsg(512,
[0.1, 0.2, 0.4], #多个尺度半径
[16, 32, 128], in_channel,[[32, 32, 64], [64, 64, 128], [64, 96, 128]])
self.sa2 = PointNetSetAbstractionMsg(128, [0.2, 0.4, 0.8], [32, 64, 128], 320,[[64, 64, 128], [128, 128, 256], [128, 128, 256]])
self.sa3 = PointNetSetAbstraction(None, None, None, 640 + 3, [256, 512, 1024], True)
self.fc1 = nn.Linear(1024, 512) #全连接
self.bn1 = nn.BatchNorm1d(512) #批标准化
self.drop1 = nn.Dropout(0.4) #dropout掉了40%的神经元
self.fc2 = nn.Linear(512, 256)
self.bn2 = nn.BatchNorm1d(256)
self.drop2 = nn.Dropout(0.5)
self.fc3 = nn.Linear(256, num_class)
def forward(self, xyz):
B, _, _ = xyz.shape
if self.normal_channel:
norm = xyz[:, 3:, :]
xyz = xyz[:, :3, :]
else:
norm = None
l1_xyz, l1_points = self.sa1(xyz, norm)
l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)
x = l3_points.view(B, 1024)
x = self.drop1(F.relu(self.bn1(self.fc1(x))))
x = self.drop2(F.relu(self.bn2(self.fc2(x))))
x = self.fc3(x)
x = F.log_softmax(x, -1)
return x,l3_points
-->Segmentation
在分割部分,因为在Encoder的过程中得到的是全局特征,但是在分割任务中需要逐点的特征。因此,PointNet++使用反向插值和跳过链接的方法,获得了逐点的特征。
其中反向插值的具体作法是使用基于K个紧邻的反向距离做加权平均(距离越远,权重越小),然后将Set Abstraction的特征和插值特征拼接。类似于CNN中的逐点卷积过程。其中策略如下图公式所示。
其中代码的插值操作如下所示:
#分割用,线性插值和MLP
class PointNetFeaturePropagation(nn.Module):
def __init__(self, in_channel, mlp):
super(PointNetFeaturePropagation, self).__init__()
self.mlp_convs = nn.ModuleList()
self.mlp_bns = nn.ModuleList()
last_channel = in_channel
for out_channel in mlp:
self.mlp_convs.append(nn.Conv1d(last_channel, out_channel, 1))
self.mlp_bns.append(nn.BatchNorm1d(out_channel)) #一维批归一化
last_channel = out_channel
def forward(self, xyz1, xyz2, points1, points2):
"""
Input:
xyz1: input points position data, [B, C, N]
xyz2: sampled input points position data, [B, C, S]
points1: input points data, [B, D, N]
points2: input points data, [B, D, S]
Return:
new_points: upsampled points data, [B, D', N]
"""
xyz1 = xyz1.permute(0, 2, 1)
xyz2 = xyz2.permute(0, 2, 1)
points2 = points2.permute(0, 2, 1)
B, N, C = xyz1.shape
_, S, _ = xyz2.shape
if S == 1: #当点个数为1
interpolated_points = points2.repeat(1, N, 1) #直接复制
else: #大于一插值
dists = square_distance(xyz1, xyz2) #计算距离
dists, idx = dists.sort(dim=-1) #按照距离排序
dists, idx = dists[:, :, :3], idx[:, :, :3] # [B, N, 3] #取前三个点
dist_recip = 1.0 / (dists + 1e-8) #距离越远权重越小
norm = torch.sum(dist_recip, dim=2, keepdim=True) #求和
weight = dist_recip / norm #对权重做归一化处理 平均
interpolated_points = torch.sum(index_points(points2, idx) * weight.view(B, N, 3, 1), dim=2)#插值 多个点的权重相加获得插值点
if points1 is not None: #如果有Points1
points1 = points1.permute(0, 2, 1)
new_points = torch.cat([points1, interpolated_points], dim=-1) #拼接Points和插值结果
else:
new_points = interpolated_points
new_points = new_points.permute(0, 2, 1)
for i, conv in enumerate(self.mlp_convs): #对拼接候得点做MLP
bn = self.mlp_bns[i]
new_points = F.relu(bn(conv(new_points)))
return new_points
对于跳过链接的方法,其中如图的C2来自于在Encoder。也就是将Encoder时候的对应层的特征拼接了过来。具体如上图Baseline中的Skip link connection。在代码中也有所体现。对于具体的Segmentation代码如下所示。在实现的过程中,在三次Set Abstraction的Encoder操作后,增加了PointNetFeatrePropagation(),在这里做了插值,之后做了Skip link connection的操作。在分割的过程中,作者也使用了DropOut(0.5)防止过拟合。具体实现代码如下:
class get_model(nn.Module):
def __init__(self, num_classes, normal_channel=False):
super(get_model, self).__init__()
if normal_channel:
additional_channel = 3
else:
additional_channel = 0
self.normal_channel = normal_channel
self.sa1 = PointNetSetAbstractionMsg(512, #点的数量
[0.1, 0.2, 0.4], #三个选择的尺度半径
[32, 64, 128], #每个尺度的点的数量
3+additional_channel, #0
[[32, 32, 64],
[64, 64, 128],
[64, 96, 128]])
self.sa2 = PointNetSetAbstractionMsg(128, [0.4,0.8], [64, 128], 128+128+64, [[128, 128, 256], [128, 196, 256]])
self.sa3 = PointNetSetAbstraction(npoint=None, radius=None, nsample=None, in_channel=512 + 3, mlp=[256, 512, 1024], group_all=True)
self.fp3 = PointNetFeaturePropagation(in_channel=1536, mlp=[256, 256])
self.fp2 = PointNetFeaturePropagation(in_channel=576, mlp=[256, 128])
self.fp1 = PointNetFeaturePropagation(in_channel=150+additional_channel, mlp=[128, 128])
self.conv1 = nn.Conv1d(128, 128, 1)
self.bn1 = nn.BatchNorm1d(128)
self.drop1 = nn.Dropout(0.5)
self.conv2 = nn.Conv1d(128, num_classes, 1)
def forward(self, xyz, cls_label):
# Set Abstraction layers
B,C,N = xyz.shape
if self.normal_channel:
l0_points = xyz
l0_xyz = xyz[:,:3,:]
else:
l0_points = xyz
l0_xyz = xyz
#SA
l1_xyz, l1_points = self.sa1(l0_xyz, l0_points)
l2_xyz, l2_points = self.sa2(l1_xyz, l1_points)
l3_xyz, l3_points = self.sa3(l2_xyz, l2_points)
# Feature Propagation layers
l2_points = self.fp3(l2_xyz, l3_xyz, l2_points, l3_points)
l1_points = self.fp2(l1_xyz, l2_xyz, l1_points, l2_points)
cls_label_one_hot = cls_label.view(B,16,1).repeat(1,1,N) #Skip connerction 也就是C2C的跳跃连接
l0_points = self.fp1(l0_xyz, l1_xyz, torch.cat([cls_label_one_hot,l0_xyz,l0_points],1), l1_points)
# FC layers
feat = F.relu(self.bn1(self.conv1(l0_points)))
x = self.drop1(feat)
x = self.conv2(x)
x = F.log_softmax(x, dim=1)
x = x.permute(0, 2, 1)
return x, l3_points
Loss
在PointNet中,Classification和Segmentation都使用了交叉熵Loss
Experiments
PointNet++对Classification和Segmentation设计了不同的网络结构,并进行了充分的实验。使用ModelNet40数据集做了Classification实验,结果如下图所示。使用ShapeNet Part做了分割实验,结果如下图所示。
自测表现
我自己也使用PointNet++在ModelNet40下跑了分类的结果,跑了25个Epoch,使用的是Tesla-T4,具体结果如下图所示
Reference
PointNet++TensorFlow版本:https://github.com/charlesq34/pointnet2
PointNet++ Pytorch版本:https://github.com/yanx27/Pointnet_Pointnet2_pytorch
Qi, C., Su, H., Mo, K., & Guibas, L. (2017, April 10). PointNet: Deep Learning on point sets for 3D classification and segmentation. Retrieved May 24, 2022, from https://arxiv.org/abs/1612.00593
Qi, C., Yi, L., Su, H., & Guibas, L. (2017, June 07). PointNet++: Deep hierarchical feature learning on point sets in a metric space. Retrieved May 24, 2022, from https://arxiv.org/abs/1706.02413