What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection

Sangmin Woo; Junhyug Noh; Kangil Kim

何をいつ見るか？：ビデオ視覚関係検出のための時間スパン提案ネットワーク

オブジェクト間の関係を特定することは、シーンを理解する上で中心的な役割を果たします。画像領域での関係モデリングのためにいくつかの作品が提案されていますが、時空間相互作用のダイナミクスに挑戦するため、ビデオ領域では多くの制約があります（たとえば、どのオブジェクト間に相互作用がありますか？関係はいつ発生して終了しますか？）。現在まで、ビデオ視覚関係検出（VidVRD）に取り組むために、セグメントベースとウィンドウベースの2つの代表的な方法が提案されています。最初に、これら2つの方法の限界を指摘し、効率と有効性の点で2つの利点を持つ新しい方法であるTemporal Span Proposal Network（TSPN）を提案します。 1）TSPNは、何を見るかを指示します。オブジェクトペアの関係性（つまり、オブジェクトのペア間の関係の存在に対する信頼スコア）をスコアリングすることにより、関係検索スペースをスパース化します。 2）TSPNは、いつ見るかを指示します。完全なビデオコンテキストを活用して、関係全体の時間的スパンとカテゴリを同時に予測します。 TSPNは、2つのVidVRDベンチマーク（ImageNet-VidVDRおよびVidOR）で大幅なマージンで新しい最先端を達成すると同時に、既存の方法よりも時間計算量が低く、特に一般的なセグメントの2倍の効率を示すことでその有効性を示していますベースのアプローチ。

Identifying relations between objects is central to understanding the scene. While several works have been proposed for relation modeling in the image domain, there have been many constraints in the video domain due to challenging dynamics of spatio-temporal interactions (e.g., Between which objects are there an interaction? When do relations occur and end?). To date, two representative methods have been proposed to tackle Video Visual Relation Detection (VidVRD): segment-based and window-based. We first point out the limitations these two methods have and propose Temporal Span Proposal Network (TSPN), a novel method with two advantages in terms of efficiency and effectiveness. 1) TSPN tells what to look: it sparsifies relation search space by scoring relationness (i.e., confidence score for the existence of a relation between pair of objects) of object pair. 2) TSPN tells when to look: it leverages the full video context to simultaneously predict the temporal span and categories of the entire relations. TSPN demonstrates its effectiveness by achieving new state-of-the-art by a significant margin on two VidVRD benchmarks (ImageNet-VidVDR and VidOR) while also showing lower time complexity than existing methods - in particular, twice as efficient as a popular segment-based approach.

updated: Thu Jul 15 2021 07:01:26 GMT+0000 (UTC)

published: Thu Jul 15 2021 07:01:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト