Global2Local: Efficient Structure Search for Video Action Segmentation

Shang-Hua Gao; Qi Han; Zhong-Yu Li; Pai Peng; Liang Wang; Ming-Ming Cheng

Global2Local：ビデオアクションセグメンテーションの効率的な構造検索

モデルの時間的受容野は、行動のセグメンテーションにおいて重要な役割を果たします。大きな受容野はビデオクリップ間の長期的な関係を促進し、小さな受容野は局所的な詳細を捉えるのに役立ちます。既存の方法は、レイヤー内の手動で設計された受容野を持つモデルを構築します。手作業で設計されたパターンを置き換えるために、受容野の組み合わせを効果的に検索できますか？この質問に答えるために、グローバルからローカルへの検索スキームを通じて、より良い受容野の組み合わせを見つけることを提案します。私たちの検索スキームは、大まかな組み合わせを見つけるためのグローバル検索と、洗練された受容野の組み合わせパターンをさらに取得するためのローカル検索の両方を活用しています。グローバル検索では、人間が設計したパターン以外の可能な粗い組み合わせが見つかります。グローバル検索に加えて、組み合わせを効果的に絞り込むために、期待に基づく反復ローカル検索スキームを提案します。グローバルからローカルへの検索を既存のアクションセグメンテーション手法にプラグインして、最先端のパフォーマンスを実現できます。

Temporal receptive fields of models play an important role in action segmentation. Large receptive fields facilitate the long-term relations among video clips while small receptive fields help capture the local details. Existing methods construct models with hand-designed receptive fields in layers. Can we effectively search for receptive field combinations to replace hand-designed patterns? To answer this question, we propose to find better receptive field combinations through a global-to-local search scheme. Our search scheme exploits both global search to find the coarse combinations and local search to get the refined receptive field combination patterns further. The global search finds possible coarse combinations other than human-designed patterns. On top of the global search, we propose an expectation guided iterative local search scheme to refine combinations effectively. Our global-to-local search can be plugged into existing action segmentation methods to achieve state-of-the-art performance.

updated: Mon Jan 04 2021 12:06:03 GMT+0000 (UTC)

published: Mon Jan 04 2021 12:06:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト