End-to-End Video Text Spotting with Transformer

Weijia Wu; Yuanqiang Cai; Chunhua Shen; Debing Zhang; Ying Fu; Hong Zhou; Ping Luo

Transformerを使用したエンドツーエンドのビデオテキストスポッティング

最近のビデオテキストスポッティング方法では、通常、3段階のパイプラインが必要です。つまり、個々の画像内のテキストの検出、ローカライズされたテキストの認識、最終結果を生成するための後処理によるテキストストリームの追跡です。これらの方法は通常、一致による追跡のパラダイムに従い、高度なパイプラインを開発します。この論文では、Transformerシーケンスモデリングに根ざし、シンプルでありながら効果的なエンドツーエンドのビデオテキスト検出、追跡、および認識フレームワーク（TransDETR）を提案します。 TransDETRには、主に2つの利点があります。1）隣接するフレームの明示的な一致パラダイムとは異なり、TransDETRは、長距離の時間シーケンス（7フレーム以上）でテキストクエリと呼ばれる異なるクエリによって各テキストを暗黙的に追跡および認識します。 2）TransDETRは、最初のエンドツーエンドのトレーニング可能なビデオテキストスポッティングフレームワークであり、3つのサブタスク（テキストの検出、追跡、認識など）に同時に対応します。 4つのビデオテキストデータセット（ICDAR2013ビデオ、ICDAR2015ビデオ、ミネット、およびYouTubeビデオテキスト）での広範な実験は、TransDETRがビデオテキストスポッティングタスクを最大約8.0％改善して最先端のパフォーマンスを達成することを実証するために実施されます。。 TransDETRのコードは、https：//github.com/weijiawu/TransDETRにあります。

Recent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0% improvements on video text spotting tasks. The code of TransDETR can be found at https://github.com/weijiawu/TransDETR.

updated: Mon Aug 22 2022 05:34:32 GMT+0000 (UTC)

published: Sun Mar 20 2022 12:14:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト