Interaction Relational Network for Mutual Action Recognition

Mauricio Perez; Jun Liu; Alex C. Kot

相互行動認識のための相互作用関係ネットワーク

人と人の相互行動認識（相互作用認識とも呼ばれる）は、人間活動分析の重要な研究分野です。この分野の現在のソリューション（主にCNN、GCN、LSTMが主流）は、相互作用パターンを適切に学習できるように、複雑なアーキテクチャと、アーキテクチャ自体に2人の関係を埋め込むメカニズムで構成されていることがよくあります。この作業での私たちの主な貢献は、人体の構造に関する最小限の事前知識を利用する、Interaction RelationalNetworkという名前のよりシンプルで非常に強力なアーキテクチャを提案することです。私たちはネットワークを動かして、相互作用する個人から体の部分をどのように関連付けるかをそれ自体で識別します。相互作用をより適切に表すために、2つの異なる関係を定義し、それぞれに特化したアーキテクチャとモデルを導きます。次に、これらの複数の関係モデルは、関係推論機能をさらに強化するために両方の情報ストリームを活用するために、単一の特別なアーキテクチャに融合されます。さらに、重要な構造化されたペアワイズ操作を定義して、関節の各ペアから意味のある追加情報（距離と動き）を抽出します。最終的に、LSTMの結合により、IRNは最も重要な順次リレーショナル推論が可能になります。ネットワークに加えたこれらの重要な拡張は、高度なリレーショナル推論を必要とする他の問題にも役立つ可能性があります。私たちのソリューションは、従来の相互作用認識データセットSBUとUT、および大規模データセットNTU RGB + Dからの相互作用で最先端のパフォーマンスを実現できます。さらに、NTU RGB + D120データセット相互作用サブセットで競争力のあるパフォーマンスを実現します。

Person-person mutual action recognition (also referred to as interaction recognition) is an important research branch of human activity analysis. Current solutions in the field -- mainly dominated by CNNs, GCNs and LSTMs -- often consist of complicated architectures and mechanisms to embed the relationships between the two persons on the architecture itself, to ensure the interaction patterns can be properly learned. Our main contribution with this work is by proposing a simpler yet very powerful architecture, named Interaction Relational Network, which utilizes minimal prior knowledge about the structure of the human body. We drive the network to identify by itself how to relate the body parts from the individuals interacting. In order to better represent the interaction, we define two different relationships, leading to specialized architectures and models for each. These multiple relationship models will then be fused into a single and special architecture, in order to leverage both streams of information for further enhancing the relational reasoning capability. Furthermore we define important structured pair-wise operations to extract meaningful extra information from each pair of joints -- distance and motion. Ultimately, with the coupling of an LSTM, our IRN is capable of paramount sequential relational reasoning. These important extensions we made to our network can also be valuable to other problems that require sophisticated relational reasoning. Our solution is able to achieve state-of-the-art performance on the traditional interaction recognition datasets SBU and UT, and also on the mutual actions from the large-scale dataset NTU RGB+D. Furthermore, it obtains competitive performance in the NTU RGB+D 120 dataset interactions subset.

updated: Thu Jan 07 2021 08:04:54 GMT+0000 (UTC)

published: Fri Oct 11 2019 04:00:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト