论文标题
立体声:通过域翻译和立体声匹配的联合优化,桥接合成到现实的域间隙
StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching
论文作者
论文摘要
大规模合成数据集有益于立体声匹配,但通常会引入已知的域偏差。尽管Cyclegan代表的无监督图像到图像对图像翻译网络在处理域间隙方面具有很大的潜力,但由于像像素畸变和翻译后立体声不匹配的问题,将此方法概括为立体声匹配是不平凡的。在本文中,我们提出了一个端到端培训框架,该培训框架具有域翻译和立体声匹配网络,以应对这一挑战。首先,在我们的端到端框架中,域翻译与立体声匹配网络之间的关节优化使前者最大程度地促进了后者。其次,该框架引入了两个新的损失,即双向多尺度特征重新注射损失和相关一致性损失,以帮助将所有合成立体图像转化为现实的图像,并保持外两极约束。以上两种贡献的有效组合可导致立体声一致的翻译和差异估计精度。此外,添加了寻求正则化项的模式,以赋予合成到现实的翻译结果具有较高的细粒度多样性。广泛的实验证明了所提出的框架在立体声匹配中桥接合成域间隙的有效性。
Large-scale synthetic datasets are beneficial to stereo matching but usually introduce known domain bias. Although unsupervised image-to-image translation networks represented by CycleGAN show great potential in dealing with domain gap, it is non-trivial to generalize this method to stereo matching due to the problem of pixel distortion and stereo mismatch after translation. In this paper, we propose an end-to-end training framework with domain translation and stereo matching networks to tackle this challenge. First, joint optimization between domain translation and stereo matching networks in our end-to-end framework makes the former facilitate the latter one to the maximum extent. Second, this framework introduces two novel losses, i.e., bidirectional multi-scale feature re-projection loss and correlation consistency loss, to help translate all synthetic stereo images into realistic ones as well as maintain epipolar constraints. The effective combination of above two contributions leads to impressive stereo-consistent translation and disparity estimation accuracy. In addition, a mode seeking regularization term is added to endow the synthetic-to-real translation results with higher fine-grained diversity. Extensive experiments demonstrate the effectiveness of the proposed framework on bridging the synthetic-to-real domain gap on stereo matching.