利用黑盒视频域适应的内部和外临时正规化

论文标题

利用黑盒视频域适应的内部和外临时正规化

Leveraging Endo- and Exo-Temporal Regularization for Black-box Video Domain Adaptation

论文作者

Xu, Yuecong, Yang, Jianfei, Cao, Haozhi, Wu, Min, Li, Xiaoli, Xie, Lihua, Chen, Zhenghua

论文摘要

为了使视频模型能够在不同环境中无缝应用，已经提出了各种视频无监督的域适应性（VUDA）方法来提高视频模型的鲁棒性和可传递性。尽管模型鲁棒性有所改进，但这些VUDA方法仍需要访问源数据和源模型参数以进行适应，从而提高了严重的数据隐私和模型可移植性问题。为了应对上述问题，本文首先将Black-Box视频域的适应（BVDA）提出为更现实但具有挑战性的场景，在该场景中，仅作为Black-Box预测指标提供了源视频模型。虽然在图像域中提出了一些用于黑框域适应性（BDA）的方法，但这些方法不能适用于视频域，因为视频模式具有更复杂的时间特征，难以对齐。为了解决BVDA，我们提出了一种新颖的内野和外在正规化网络（EXTERS），通过应用蒙版到混合策略和视频量的正则化：内乘正则化和临时正规化，在剪辑和时间上执行的既定特征，同时从从黑色盒预测器那里获得的预测中进行蒸馏的知识，从而执行。经验结果表明，在各种跨域封闭设置和部分集合动作识别基准中，外部的最新性能甚至超过了源数据可访问性的大多数现有视频域适应方法。

To enable video models to be applied seamlessly across video tasks in different environments, various Video Unsupervised Domain Adaptation (VUDA) methods have been proposed to improve the robustness and transferability of video models. Despite improvements made in model robustness, these VUDA methods require access to both source data and source model parameters for adaptation, raising serious data privacy and model portability issues. To cope with the above concerns, this paper firstly formulates Black-box Video Domain Adaptation (BVDA) as a more realistic yet challenging scenario where the source video model is provided only as a black-box predictor. While a few methods for Black-box Domain Adaptation (BDA) are proposed in image domain, these methods cannot apply to video domain since video modality has more complicated temporal features that are harder to align. To address BVDA, we propose a novel Endo and eXo-TEmporal Regularized Network (EXTERN) by applying mask-to-mix strategies and video-tailored regularizations: endo-temporal regularization and exo-temporal regularization, performed across both clip and temporal features, while distilling knowledge from the predictions obtained from the black-box predictor. Empirical results demonstrate the state-of-the-art performance of EXTERN across various cross-domain closed-set and partial-set action recognition benchmarks, which even surpassed most existing video domain adaptation methods with source data accessibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题