每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network

Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network with Online Error Correction

原文IROS2018

Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery

每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network
文中对一些论文进行评价
[4]通过光流Mixture density network MDN
[5] SFM However, their approach was unable to recover the scale for the depth estimates or, most crucially, the scale of the changes in pose.SFMLearner also required a sequence of images to compute a trajectory. Their best results were on an input sequence of five images whereas our network only requires a source-target image pairing.
[6] UnDeepVO UnDeepVO can only be trained on datasets where stereo image pairs are available. Additionally, the network architecture of UnDeepVO cannot be extended to include motion estimates derived from inertial measurements because the spatial transformation between paired images from stereo cameras are unobservable by an IMU (stereo images are recorded simultaneously).
[17] VINet [17] was the first end-to-end trainable visual-inertial deep network. While VINet showed robustness to temporal and spatial misalignments of an IMU and camera, it still required extrinsic calibration parameters between camera and IMU. This is in contrast to our VIOLearner which requires no IMU intrinsics or IMU-camera extrinsics. In addition, VINet was trained in a supervised manner and thus required the ground truth pose differences for each exemplar in the training set which are not always readily available.

摘要

提出了一种无监督的深度神经网络方法，用于RGB-D图像与惯性测量数据的融合，用于绝对轨迹估计。我们的网络被称为视觉惯性里程计学习器（VIOLearner），它学习在没有惯性测量单元（IMU）固有参数（对应于陀螺仪和加速度计偏差或白噪声）或IMU和相机之间的外在校准的情况下执行视觉惯性里程计（VIO）。该网络学习整合IMU测量并生成假设轨迹，然后根据缩放图像投影误差相对于像素坐标的空间网格的Jacobians在线校正。我们根据KITTI odometry数据集上的最新（SOA）视觉惯性里程计、视觉里程计和视觉同步定位和映射（VSLAM）方法评估我们的网络，并展示竞争性里程计性能。

贡献

本文的主要贡献是利用RGB-D+惯性测量的绝对标度恢复进行轨迹估计的无监督学习
•内置在线纠错模块；
•未知的IMU-camera标定参数；
•时序松耦合的相机和IMU参数

问题

1.当前单目相机不能够直接估计尺度和深度;
2.GPS用来估计位姿准确性不能保证
3.IMU数据有噪声与图像不连续的问题

本文选择将深度同步度包含在输入域中，作为实现绝对尺度恢复的一种方法：虽然绝对深度可以使用机载传感器生成，但对于图像对之间的姿势变化来说，这是不可能的。
本文提出的VIOLearner是一个无监督的VIO深度网络，它估计运动相机在捕获源图像Ij的某个时间tj与捕获目标图像Ij + 1的时间tj + 1之间的缩放运动。VIOLearner接收输入的RGB-D源图像，目标RGB图像，从tj-1到tj + 1的IMU数据，以及带有摄像机固有参数的摄像机校准矩阵K。通过访问K，VIOLearner可以使用视图合成方法在摄像机帧中生成摄像机姿势变化，其中训练的基础是目标图像与使用从源采样的源图像中的像素生成的重建目标图像之间的欧几里得损失由学习到的3D仿射变换确定的位置。

每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network

作者将误差匹配过程分为不同的level，在Level0中通过双目视频对和depth结合估计的IMU位姿,从一幅图像生成另一幅图像，然后得到误差E0。对E0求偏导，将估计的偏差值的倒数作为下一帧的输入，用于更新当前的偏差，直到输出误差最小。
每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network

误差更新最小化误差

然后只计算这一个假设路径的损失，错误只会反向传播到该路径中的参数。因此，只更新有助于获胜假设的参数，而其余参数保持不变。然后，训练网络的最终损失L是每个级别的欧几里德损失项加上偏差项上的加权L1惩罚之和，我们根据经验发现，偏差项能更好地促进训练和梯度反向传播：每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network
对E更新

网络结构描述

IMU处理：VIOLearner的初始层分别使用7个卷积层的两条平行路径来计算IMU角速度和线性加速度。每条路径以2个卷积层开始，每个卷积层是batchx20x3 IMU角速度或线性加速度上的64个单步长3x5滤波器中的每一个，然后是2个卷积层，每个卷积层是128个滤波器，每个步长2具有相同的3x5核。接下来，以2、1和1的步幅应用256个滤波器的3个卷积层，以及大小为3x5、3x3和3x1的内核。在将角速度和线性加速度路径中的最终卷积层拼接成张量姿态imu之前，使用一个卷积层和三个核大小为1和步长为1的滤波器将其展平成atchx1x3张量。

试验效果

每天一天论文 292 /365 Vision-Aided Absolute Trajectory Estimation Using an Unsupervised Deep Network