论文标题
场景文本识别的Portmanteauing功能
Portmanteauing Features for Scene Text Recognition
论文作者
论文摘要
场景文本图像具有不同的形状,并且会遭受各种扭曲,例如透视扭曲。为了应对这些挑战,最新的方法依赖于连接到文本识别网络的整流网络。它们形成了线性管道,该管道在所有输入图像上都使用文本矫正,即使是可以识别的图像也是如此。毫无疑问,纠正网络改善了整体文本识别性能。但是,在某些情况下,纠正网络会在图像上产生不必要的扭曲,从而导致图像中的预测不正确,而图像没有它是正确的。为了减轻不必要的扭曲,提出了特征的portmanteau。 Portmanteau功能受Portmanteau Word的启发,是一项功能,其中包含来自原始文本图像和整流图像的信息。为了生成Portmanteau功能,提出了具有块矩阵初始化的非线性输入管道。在这项工作中,由于注意力和固有的并行性,可以有效地处理Portmanteau功能,因此选择了变压器作为识别网络。在6个基准测试中检查了所提出的方法,并将其与13种最先进的方法进行了比较。实验结果表明,所提出的方法的表现优于各种基准的最先进方法。
Scene text images have different shapes and are subjected to various distortions, e.g. perspective distortions. To handle these challenges, the state-of-the-art methods rely on a rectification network, which is connected to the text recognition network. They form a linear pipeline which uses text rectification on all input images, even for images that can be recognized without it. Undoubtedly, the rectification network improves the overall text recognition performance. However, in some cases, the rectification network generates unnecessary distortions on images, resulting in incorrect predictions in images that would have otherwise been correct without it. In order to alleviate the unnecessary distortions, the portmanteauing of features is proposed. The portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image. To generate the portmanteau feature, a non-linear input pipeline with a block matrix initialization is presented. In this work, the transformer is chosen as the recognition network due to its utilization of attention and inherent parallelism, which can effectively handle the portmanteau feature. The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods. The experimental results show that the proposed method outperforms the state-of-the-art methods on various of the benchmarks.