Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

Daizong Liu; Xiaoye Qu; Jianfeng Dong; Pan Zhou

ビデオにおける時間的文のローカリゼーションのための適応提案生成ネットワーク

ビデオ（TSLV）での一時的な文のローカリゼーションの問題に対処します。従来の方法は、事前定義されたセグメント提案を使用してターゲットセグメントをローカライズするトップダウンフレームワークに従います。彼らはまともなパフォーマンスを達成しましたが、提案は手作りで冗長です。最近、ボトムアップフレームワークはその優れた効率性のためにますます注目を集めています。各フレームの確率を境界として直接予測します。ただし、ボトムアップモデルのパフォーマンスは、セグメントレベルの相互作用を活用できないため、トップダウンモデルよりも劣ります。この論文では、効率を高速化しながらセグメントレベルの相互作用を維持するための適応提案生成ネットワーク（APGN）を提案します。具体的には、最初にビデオに対して前景と背景の分類を実行し、前景フレームで回帰して、提案を適応的に生成します。このようにして、手作りの提案デザインは破棄され、余分な提案が減少します。次に、生成されたプロポーザルのセマンティクスを強化するために、プロポーザル統合モジュールがさらに開発されます。最後に、トップダウンのフレームワークに従って、これらの生成された提案を使用してターゲットの瞬間を特定します。 3つの挑戦的なベンチマークに関する広範な実験は、提案されたAPGNが以前の最先端の方法を大幅に上回っていることを示しています。

We address the problem of temporal sentence localization in videos (TSLV). Traditional methods follow a top-down framework which localizes the target segment with pre-defined segment proposals. Although they have achieved decent performance, the proposals are handcrafted and redundant. Recently, bottom-up framework attracts increasing attention due to its superior efficiency. It directly predicts the probabilities for each frame as a boundary. However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction. In this paper, we propose an Adaptive Proposal Generation Network (APGN) to maintain the segment-level interaction while speeding up the efficiency. Specifically, we first perform a foreground-background classification upon the video and regress on the foreground frames to adaptively generate proposals. In this way, the handcrafted proposal design is discarded and the redundant proposals are decreased. Then, a proposal consolidation module is further developed to enhance the semantic of the generated proposals. Finally, we locate the target moments with these generated proposals following the top-down framework. Extensive experiments on three challenging benchmarks show that our proposed APGN significantly outperforms previous state-of-the-art methods.

updated: Tue Sep 14 2021 02:02:36 GMT+0000 (UTC)

published: Tue Sep 14 2021 02:02:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト