3D点云目标追踪论文盘点（含标注工具）

3D点云相比2D图片多出来深度这一元素，大大提高了定位空间目标位置的准确性，在机器人,自动驾驶,虚拟现实,遥感测绘等领域有着广泛的应用前景，是计算机视觉未来发展的必然方向，如何精准的识别各个物体在空间中的形状、位置、姿态，恢复和重建物体表面的三角化网格和纹理映射，实现物体在三维空间的检测、识别、追踪以及交互，仅仅依赖一副图像是无法做到的，借助不同相机组成的立体视觉系统，亦或由相机、雷达等多个传感器融合获取3D点云数据是比较常用的方法。

在此基础上，诸多研究者们提出了很多有效的方法和策略。

在点云的目标跟踪的研究中前期处理中应用最多的算法当属点云配准类算法，点云配准分为**粗配准（Coarse Registration）和精配准（Fine Registration）**两个阶段。
精配准的目的是在粗配准的基础上让点云之间的空间位置差别最小化。应用最为广泛的精配准算法应该是ICP以及ICP的各种变种（稳健ICP、point to plane ICP、Point to line ICP、MBICP、GICP、NICP）不过ICP需要先进行粗配准才能够。
关于ICP算法详解：推荐https://blog.csdn.net/sinat_34165087/article/details/78567289，此文主要是探讨标准ICP算法和二维的ICP算法原理如何计算最近点集和变换矩阵，从而计算目标函数与阈值判断，计算步骤非常详细，代码相关：https://blog.csdn.net/peach_blossom/article/details/78506184，算法相互对比：https://blog.csdn.net/weixin_43236944/article/details/88188532。另一种基于点对特征 (PPF-Point Pair Feature)的方法。对于缺乏表面纹理信息，或局部曲率变化很小，或点云本身就非常稀疏的物体，采用局部特征描述子很难有效的提取到匹配对。所以就有了所谓基于Point Pair 的特征，该特征使用了一些全局的信息来进行匹配，正如http://campar.in.tum.de/pub/drost2010CVPR/drost2010CVPR.pdf一文中所说，基于定向点对特征创建全局模型描述，并使用快速投票方案在本地匹配该模型，最后根据投票结果计算目标位姿。
最重要的一点是，最终的位姿估计结果并不会陷入局部最小值。但是限制最小距离的方式会丢失大量模型信息，而且人为设计的投票权重有更多经验性和主观性。两种方法各有优劣，三维目标跟踪通过对连续帧的跟进，可以充分提取点云中丰富的几何信息，对二维图像目标跟踪所面临的遮挡、光照和尺度变化等问题有相当好的解决方法。

主要论文包括：
Leveraging shape completion for 3D siamese tracking
摘要：
点云由于其稀疏性，对其处理具有挑战性，因此自动驾驶车辆更多地依赖于外观属性而不是纯粹的几何特征。然而，3D激光雷达感知可以为具有挑战性的光线或天气条件下的城市导航提供关键信息。在本文中，我们研究了形状补全在激光雷达点云三维目标跟踪中的通用性，设计了一个孪生跟踪器，将模型和候选形状编码成一个紧凑的潜在表示。并通过强制将潜在表示解码成对象模型形状来规范化编码，观察到了三维目标跟踪和三维形状完成是相辅相成的。从而学习更有意义的潜在表示可以显示更好的识别能力，提高跟踪性能。我们在KITTI跟踪数据集上使用car3d边界框测试了我们的方法。该模型对三维目标跟踪的成功率为76.94%，精度为81.38%，形状完成正则化使得两个指标都提高了3%
Abstract
Point clouds are challenging to process due to their sparsity, therefore autonomous vehicles rely more on appearance attributes than pure geometric features. However, 3D LIDAR perception can provide crucial information for urban navigation in challenging light or weather conditions. In this paper, we investigate the versatility of Shape Completion for 3D Object Tracking in LIDAR point clouds. We design a Siamese tracker that encodes model and candidate shapes into a compact latent representation. We regularize the encoding by enforcing the latent representation to decode into an object model shape. We observe that 3D object tracking and 3D shape completion complement each other. Learning a more meaningful latent representation shows better discriminatory capabilities, leading to improved tracking performance. We test our method on the KITTI Tracking set using car 3D bounding boxes. Our model reaches a 76.94%Success rate and 81.38%Precision for 3D Object Tracking, with the shape completion regularization leading to an improvement of 3%in both metrics.
论文地址：https://www.researchgate.net/publication/331544229_Leveraging_Shape_Completion_for_3D_Siamese_Tracking

Complexer-YOLO: Real-time 3D object detection and tracking on semantic point clouds
论文链接：https://arxiv.org/abs/1904.07537
Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.
准确检测3D对象是计算机视觉的一个基本问题，它在自动驾驶汽车、增强/虚拟现实和机器人技术等诸多领域有着巨大的影响。基于在自动驾驶领域的3D检测器和视觉语义分割，我们提出了一种新的融合神经网络。此外，我们还介绍了Scale-Rotation-Translation score( SRTs)，这是一种用于比较对象检测的快速且高度可参数化的评估度量机制，可以将在线推断时间提高20%，并将训练时间减半。在此基础上，我们还使用了在线多目标特征跟踪来进行目标检测，以进一步提高利用时间信息的准确性和鲁棒性。我们在KITTI上的实验表明，我们在所有相关类别中不仅取得了较好的结果，同时也保持了性能和准确性的权衡，并且可以实时运行。此外，我们的模型还是第一个能够将视觉语义与三维目标检测融合的模型。
除了上述方式，还有一些基于光流思想的跟踪算法。类似于二维视觉中的光流估计，已有多种方法开始从点云序列中学习有用信息（如三维场景流、空间临时信息）

相关辅助数据集
KITTI

相关标注工具
京东众智3D点云标注工具
标注图片：
3D点云目标追踪论文盘点（含标注工具）

操作流程：
操作方法详解