论文标题

使用变压器任意形状的文本检测

Arbitrary Shape Text Detection using Transformers

论文作者

Raisi, Zobeir, Younes, Georges, Zelek, John

论文摘要

最近的文本检测框架需要几个手工制作的组件,例如锚生成,非最大抑制(NMS)或多个处理阶段(例如标签生成),以检测任意形状的文本图像。相比之下,我们根据使用变压器(DETR)的检测提出了一个可端到端的训练架构,该体系结构在任意形状的文本检测中优于先前的最新方法。从本质上讲,我们提出的方法利用了一个边界框损耗函数,该函数可以准确地测量被检测到的文本区域的规模和纵横比的变化。这是由于由Bezeier曲线制成的混合形状表示,这是可能的,这些曲线将进一步分为零件多边形。然后,提出的损失函数是在零件多边形上定义的广义分裂交换损失的组合,并在bezier曲线的控制点上通过平滑的-U \ ln $回归正规化。我们使用总文本和CTW-1500数据集评估了我们提出的模型,用于曲面文本,以及用于多面向文本的MSRA-TD500和ICDAR15数据集,并表明所提出的方法在任意形状的文本检测任务中优于先前的先前最新方法。

Recent text detection frameworks require several handcrafted components such as anchor generation, non-maximum suppression (NMS), or multiple processing stages (e.g. label generation) to detect arbitrarily shaped text images. In contrast, we propose an end-to-end trainable architecture based on Detection using Transformers (DETR), that outperforms previous state-of-the-art methods in arbitrary-shaped text detection. At its core, our proposed method leverages a bounding box loss function that accurately measures the arbitrary detected text regions' changes in scale and aspect ratio. This is possible due to a hybrid shape representation made from Bezier curves, that are further split into piece-wise polygons. The proposed loss function is then a combination of a generalized-split-intersection-over-union loss defined over the piece-wise polygons and regularized by a Smooth-$\ln$ regression over the Bezier curve's control points. We evaluate our proposed model using Total-Text and CTW-1500 datasets for curved text, and MSRA-TD500 and ICDAR15 datasets for multi-oriented text, and show that the proposed method outperforms the previous state-of-the-art methods in arbitrary-shape text detection tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源