论文标题

Context context-LSTM:ucf101上用于视频检测的强大分类器

Context-LSTM: a robust classifier for video detection on UCF101

论文作者

Li, Dengshan, Wang, Rujing

论文摘要

视频检测和人类动作识别可能在计算上可能很昂贵,并且需要很长时间才能训练模型。在本文中,我们旨在减少训练时间和视频检测的GPU记忆使用情况,并实现竞争性检测准确性。其他研究作品,例如两际流,C3D,TSN在UCF101上表现出色。在这里,我们仅将LSTM结构用于视频检测。我们使用简单的结构在UCF101的整个验证数据集上执行竞争性TOP-1精度。 LSTM结构被命名为上下文LSTM,因为它可能会处理深度的时间特征。上下文LSTM可以模拟人类识别系统。我们将LSTM块级联在Pytorch中,并连接了细胞状态和隐藏的输出流。在块的连接下,我们使用了Relu,批处理归一化和MaxPooling功能。 Context LSTM可以减少训练时间和GPU内存使用量,同时在UCF101整个验证数据集上保持最先进的TOP-1准确性,并在视频操作检测中显示出强大的性能。

Video detection and human action recognition may be computationally expensive, and need a long time to train models. In this paper, we were intended to reduce the training time and the GPU memory usage of video detection, and achieved a competitive detection accuracy. Other research works such as Two-stream, C3D, TSN have shown excellent performance on UCF101. Here, we used a LSTM structure simply for video detection. We used a simple structure to perform a competitive top-1 accuracy on the entire validation dataset of UCF101. The LSTM structure is named Context-LSTM, since it may process the deep temporal features. The Context-LSTM may simulate the human recognition system. We cascaded the LSTM blocks in PyTorch and connected the cell state flow and hidden output flow. At the connection of the blocks, we used ReLU, Batch Normalization, and MaxPooling functions. The Context-LSTM could reduce the training time and the GPU memory usage, while keeping a state-of-the-art top-1 accuracy on UCF101 entire validation dataset, show a robust performance on video action detection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源