High-order Tensor Pooling with Attention for Action Recognition

Piotr Koniusz; Lei Wang; Ke Sun

アクション認識を重視した高次テンソルプーリング

我々は、ニューラルネットワークによって形成される特徴ベクトルの高次統計を取得することを目的としており、テンソル記述子を形成するためのエンドツーエンドの二次以降のプーリングを提案します。テンソル記述子には、集約されたベクトルの数が少ないことと、特定の特徴が統計的に予想されるよりも頻繁に/または低い頻度で出現する場合のバースト性現象のため、堅牢な類似性測定が必要です。グラフラプラシアン上の熱拡散プロセス (HDP) は、共分散/自己相関行列の固有値電力正規化 (EPN) と密接に関連しており、その逆行列がループ状のグラフラプラシアンを形成します。我々は、HDP と EPN が同じ役割、つまり固有スペクトルの大きさを高めたり弱めたりしてバースト性を防止する役割を果たすことを示します。高次テンソルには、バースト性を防ぐために高次の発生のスペクトル検出器として機能する EPN を装備します。また、d 次元の特徴記述子から構築された次数 r のテンソルの場合、そのような検出器は、少なくとも 1 つの高次の出現がそのテンソルによって表される binom(d,r) 部分空間の 1 つに「投影」されるかどうかの尤度を与えることも証明します。したがって、binom(d,r) などの「検出器」を備えたテンソルパワー正規化計量が形成されます。実験への貢献として、いくつかの 2 次および高次のプーリングバリアントをアクション認識に適用し、これまでに提示されていないそのようなプーリングバリアントの比較を提供し、HMDB-51、YUP++、および MPII クッキングアクティビティに関する最先端の結果を示します。

We aim at capturing high-order statistics of feature vectors formed by a neural network, and propose end-to-end second- and higher-order pooling to form a tensor descriptor. Tensor descriptors require a robust similarity measure due to low numbers of aggregated vectors and the burstiness phenomenon, when a given feature appears more/less frequently than statistically expected. The Heat Diffusion Process (HDP) on a graph Laplacian is closely related to the Eigenvalue Power Normalization (EPN) of the covariance/auto-correlation matrix, whose inverse forms a loopy graph Laplacian. We show that the HDP and the EPN play the same role, i.e., to boost or dampen the magnitude of the eigenspectrum thus preventing the burstiness. We equip higher-order tensors with EPN which acts as a spectral detector of higher-order occurrences to prevent burstiness. We also prove that for a tensor of order r built from d dimensional feature descriptors, such a detector gives the likelihood if at least one higher-order occurrence is 'projected' into one of binom(d,r) subspaces represented by the tensor; thus forming a tensor power normalization metric endowed with binom(d,r) such 'detectors'. For experimental contributions, we apply several second- and higher-order pooling variants to action recognition, provide previously not presented comparisons of such pooling variants, and show state-of-the-art results on HMDB-51, YUP++ and MPII Cooking Activities.

updated: Thu Jul 20 2023 14:29:07 GMT+0000 (UTC)

published: Mon Oct 11 2021 12:32:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト