Video Instance Segmentation using Inter-Frame Communication Transformers

Sukjun Hwang; Miran Heo; Seoung Wug Oh; Seon Joo Kim

フレーム間通信トランスフォーマーを使用したビデオインスタンスのセグメンテーション

トランスフォーマーに基づくビデオインスタンスセグメンテーション (VIS) の新しいエンドツーエンドソリューションを提案します。最近、クリップごとのパイプラインは、複数のフレームからの豊富な情報を活用して、フレームごとの方法よりも優れたパフォーマンスを示しています。ただし、以前のクリップごとのモデルでは、フレーム間通信を実現するために大量の計算とメモリの使用が必要であり、実用性が制限されていました。この作業では、フレーム間通信トランスフォーマー (IFC) を提案します。これは、入力クリップ内のコンテキストを効率的にエンコードすることにより、フレーム間の情報受け渡しのオーバーヘッドを大幅に削減します。具体的には、各フレームシーンを要約するだけでなく、情報を伝達する手段として簡潔なメモリトークンを利用することを提案します。正確にエンコードされたメモリトークン間で情報を交換することにより、各フレームの機能が強化され、他のフレームと相互に関連付けられます。最新のベンチマークセットでこの方法を検証し、非常に高速なランタイム (89.4 FPS) でありながら、最先端のパフォーマンス (オフライン推論を使用した YouTube-VIS 2019 の値セットでの AP 44.6) を達成しました。私たちの方法は、わずかな遅延でリアルタイムにビデオを処理するためのほぼオンラインの推論にも適用できます。コードは公開されます。

We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames. However, previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications, limiting practicality. In this work, we propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip. Specifically, we propose to utilize concise memory tokens as a mean of conveying information as well as summarizing each frame scene. The features of each frame are enriched and correlated with other frames through exchange of information between the precisely encoded memory tokens. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (AP 44.6 on YouTube-VIS 2019 val set using the offline inference) while having a considerably fast runtime (89.4 FPS). Our method can also be applied to near-online inference for processing a video in real-time with only a small delay. The code will be made available.

updated: Mon Jun 07 2021 02:08:39 GMT+0000 (UTC)

published: Mon Jun 07 2021 02:08:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト