空间变形的文本注意网络可靠场景文本图像超分辨率

论文标题

空间变形的文本注意网络可靠场景文本图像超分辨率

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution

论文作者

Ma, Jianqi, Liang, Zhetong, Zhang, Lei

论文摘要

场景文本图像超分辨率旨在提高低分辨率图像中文本的分辨率和可读性。尽管深卷积神经网络（CNN）已经取得了重大改进，但对于空间变形的文本，尤其是旋转和曲线形的图像，很难重建高分辨率图像。这是因为当前基于CNN的方法采用了基于局部性的操作，这些操作无效处理由变形引起的变化。在本文中，我们提出了一个基于CNN的文本注意网络（TATT）来解决此问题。文本识别模块首先提取文本的语义作为文本先验信息。然后，我们设计了一个基于变压器的新型模块，该模块利用全局注意机制，以在文本重建过程之前发挥文本的语义指导。此外，我们提出了文本结构一致性损失，以通过对常规和变形文本的重建施加结构一致性来完善视觉外观。基准TextZoom数据集的实验表明，所提出的Tatt不仅在PSNR/SSIM指标方面实现了最先进的性能，而且还显着提高了下游文本识别任务的识别精度，尤其是对于具有多方向和曲线形状和曲线形状的文本实例。代码可从https://github.com/mjq11302010044/tatt获得。

Scene text image super-resolution aims to increase the resolution and readability of the text in low-resolution images. Though significant improvement has been achieved by deep convolutional neural networks (CNNs), it remains difficult to reconstruct high-resolution images for spatially deformed texts, especially rotated and curve-shaped ones. This is because the current CNN-based methods adopt locality-based operations, which are not effective to deal with the variation caused by deformations. In this paper, we propose a CNN based Text ATTention network (TATT) to address this problem. The semantics of the text are firstly extracted by a text recognition module as text prior information. Then we design a novel transformer-based module, which leverages global attention mechanism, to exert the semantic guidance of text prior to the text reconstruction process. In addition, we propose a text structure consistency loss to refine the visual appearance by imposing structural consistency on the reconstructions of regular and deformed texts. Experiments on the benchmark TextZoom dataset show that the proposed TATT not only achieves state-of-the-art performance in terms of PSNR/SSIM metrics, but also significantly improves the recognition accuracy in the downstream text recognition task, particularly for text instances with multi-orientation and curved shapes. Code is available at https://github.com/mjq11302010044/TATT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题