Temporal Memory Attention for Video Semantic Segmentation

Hao Wang; Weining Wang; Jing Liu

ビデオセマンティックセグメンテーションのための時間的メモリ注意

ビデオセマンティックセグメンテーションでは、ビデオシーケンスのフレーム間の複雑な時間的関係を利用する必要があります。以前の作品は通常、正確なオプティカルフローを利用して、計算コストが非常に高い時間的関係を活用しています。本論文では、徹底的なオプティカルフロー予測なしに自己注意メカニズムに基づいてビデオシーケンス上の長距離時間的関係を適応的に統合するための時間的記憶注意ネットワーク（TMANet）を提案する。特に、過去のいくつかのフレームを使用してメモリを構築し、現在のフレームの時間情報を格納します。次に、現在のフレームとメモリの間の関係をキャプチャして現在のフレームの表現を強化するための時間的メモリ注意モジュールを提案します。私たちの方法は、2つの挑戦的なビデオセマンティックセグメンテーションデータセット、特に都市の景観で80.3％mIoU、ResNet-50を使用したCamVidで76.5％mIoUで新しい最先端のパフォーマンスを実現します。

Video semantic segmentation requires to utilize the complex temporal relations between frames of the video sequence. Previous works usually exploit accurate optical flow to leverage the temporal relations, which suffer much from heavy computational cost. In this paper, we propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Specially, we construct a memory using several past frames to store the temporal information of the current frame. We then propose a temporal memory attention module to capture the relation between the current frame and the memory to enhance the representation of the current frame. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50.

updated: Mon Sep 13 2021 02:53:14 GMT+0000 (UTC)

published: Wed Feb 17 2021 09:18:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト