S3-net：通过单拍分段的快速轻巧的视频场景理解网络

论文标题

S3-net：通过单拍分段的快速轻巧的视频场景理解网络

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation

论文作者

Cheng, Yuan, Yang, Yuchao, Chen, Hai-Bao, Wong, Ngai, Yu, Hao

论文摘要

在视频中的实时理解在各种AI应用中至关重要，例如自动驾驶。这项工作为视频场景理解提供了快速的单次分割策略。所提出的NET（称为S3-NET）迅速定位和细分市场目标子剖面，同时提取物结构化的时间序列语义特征是基于LSTM的时空模型的输入。利用张力和量化技术，S3-NET旨在轻巧用于边缘计算。使用CityScapes，UCF11，HMDB51和MOMENTS数据集进行的实验表明，提议的S3-NET可准确提高8.1％，而基于3D-CNN的UCF11的方法，储存量为6.9倍，推理速度的储存速度和推理速度的降低为22.8 fps，而CityScapes则具有GTX1080808080TI的22.8 fps。

Real-time understanding in video is crucial in various AI applications such as autonomous driving. This work presents a fast single-shot segmentation strategy for video scene understanding. The proposed net, called S3-Net, quickly locates and segments target sub-scenes, meanwhile extracts structured time-series semantic features as inputs to an LSTM-based spatio-temporal model. Utilizing tensorization and quantization techniques, S3-Net is intended to be lightweight for edge computing. Experiments using CityScapes, UCF11, HMDB51 and MOMENTS datasets demonstrate that the proposed S3-Net achieves an accuracy improvement of 8.1% versus the 3D-CNN based approach on UCF11, a storage reduction of 6.9x and an inference speed of 22.8 FPS on CityScapes with a GTX1080Ti GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题