Disentangled Action Recognition with Knowledge Bases

Zhekun Luo; Shalini Ghosh; Devin Guillory; Keizo Kato; Trevor Darrell; Huijuan Xu

知識ベースによる解きほぐされた行動認識

ビデオでのアクションには、通常、人間とオブジェクトの相互作用が含まれます。アクションラベルは通常、動詞と名詞のさまざまな組み合わせで構成されていますが、考えられるすべての組み合わせのトレーニングデータがない場合があります。本論文では、知識グラフの力を活用することにより、訓練時間中に見えない新しい動詞や新しい名詞への構成行動認識モデルの一般化能力を向上させることを目指しています。以前の作業では、知識グラフの動詞と名詞の構成アクションノードを利用しており、構成アクションノードの数が動詞と名詞の数に対して二乗的に増加するため、スケーリングが非効率になります。この問題に対処するために、私たちは私たちのアプローチを提案します：アクションの固有の構成性を活用する知識ベースによる解きほぐされたアクション認識（DARK）。 DARKは、最初に動詞と名詞のもつれを解いた特徴表現を抽出し、次に外部知識グラフの関係を使用して分類の重みを予測することにより、因数分解されたモデルをトレーニングします。動詞と名詞の間の型制約は、外部の知識ベースから抽出され、アクションを作成するときに最終的に適用されます。 DARKは、オブジェクトと動詞の数のスケーラビリティが向上しており、Charadesデータセットで最先端のパフォーマンスを実現します。さらに、クラスとサンプルの数が1桁大きい、Epic-kitchenデータセットに基づく新しいベンチマーク分割を提案し、このベンチマークでさまざまなモデルをベンチマークします。

Action in video usually involves the interaction of human with objects. Action labels are typically composed of various combinations of verbs and nouns, but we may not have training data for all possible combinations. In this paper, we aim to improve the generalization ability of the compositional action recognition model to novel verbs or novel nouns that are unseen during training time, by leveraging the power of knowledge graphs. Previous work utilizes verb-noun compositional action nodes in the knowledge graph, making it inefficient to scale since the number of compositional action nodes grows quadratically with respect to the number of verbs and nouns. To address this issue, we propose our approach: Disentangled Action Recognition with Knowledge-bases (DARK), which leverages the inherent compositionality of actions. DARK trains a factorized model by first extracting disentangled feature representations for verbs and nouns, and then predicting classification weights using relations in external knowledge graphs. The type constraint between verb and noun is extracted from external knowledge bases and finally applied when composing actions. DARK has better scalability in the number of objects and verbs, and achieves state-of-the-art performance on the Charades dataset. We further propose a new benchmark split based on the Epic-kitchen dataset which is an order of magnitude bigger in the numbers of classes and samples, and benchmark various models on this benchmark.

updated: Mon Jul 04 2022 20:19:13 GMT+0000 (UTC)

published: Mon Jul 04 2022 20:19:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト