论文标题

基于树的文本视频bert,用于baidu视频广告中的视频搜索

Tree-based Text-Vision BERT for Video Search in Baidu Video Advertising

论文作者

Yu, Tan, Liu, Jie, Yang, Yi, Li, Yi, Fei, Hongliang, Li, Ping

论文摘要

通信技术的发展和智能手机的普及促进了视频广告的蓬勃发展。 Baidu是世界上领先的搜索引擎公司之一,每天收到数十亿个搜索查询。如何将视频广告与用户搜索配对是百度视频广告的核心任务。由于模式差距,比传统查询到文档检索和图像到图像搜索的查询到视频检索更具挑战性。传统上,查询到视频检索是通过查询到标题检索来解决的,当瓷砖的质量不高时,这是不可靠的。近年来,随着计算机视觉和自然语言处理的快速进步,基于内容的搜索方法变得有望在查询到视频检索中。受益于大规模数据集的预处理,一些基于跨模式关注的Visionbert方法在许多视觉语言任务中不仅在学术界而且在行业中都取得了出色的表现。然而,跨模式关注的昂贵计算成本使得在工业应用中进行大规模搜索是不切实际的。在这项工作中,我们提出了一个基于树的组合注意网络(TCAN),该网络最近在百度的动态视频广告平台上推出。它提供了一种实用的解决方案,可以在大规模查询到视频搜索中部署大量的跨模式关注。在启动基于树的组合注意网络之后,点击率提高了2.29 \%,转化率提高了2.63 \%。

The advancement of the communication technology and the popularity of the smart phones foster the booming of video ads. Baidu, as one of the leading search engine companies in the world, receives billions of search queries per day. How to pair the video ads with the user search is the core task of Baidu video advertising. Due to the modality gap, the query-to-video retrieval is much more challenging than traditional query-to-document retrieval and image-to-image search. Traditionally, the query-to-video retrieval is tackled by the query-to-title retrieval, which is not reliable when the quality of tiles are not high. With the rapid progress achieved in computer vision and natural language processing in recent years, content-based search methods becomes promising for the query-to-video retrieval. Benefited from pretraining on large-scale datasets, some visionBERT methods based on cross-modal attention have achieved excellent performance in many vision-language tasks not only in academia but also in industry. Nevertheless, the expensive computation cost of cross-modal attention makes it impractical for large-scale search in industrial applications. In this work, we present a tree-based combo-attention network (TCAN) which has been recently launched in Baidu's dynamic video advertising platform. It provides a practical solution to deploy the heavy cross-modal attention for the large-scale query-to-video search. After launching tree-based combo-attention network, click-through rate gets improved by 2.29\% and conversion rate get improved by 2.63\%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源