Implicit Equivariance in Convolutional Networks

Naman Khetan; Tushar Arora; Samee Ur Rehman; Deepak K. Gupta

畳み込みネットワークにおける暗黙の同変

畳み込みニューラルネットワーク（CNN）は、変換では本質的に同変ですが、回転やスケールの変更などの他の変換を処理するための同等の組み込みメカニズムはありません。設計により、CNNを他の変換グループの下で同変にするいくつかのアプローチが存在します。これらの中で、操作可能なCNNは特に効果的です。ただし、これらのアプローチでは、複雑な分析関数を含む事前定義された基準の組み合わせからマップされたフィルターを使用して、標準ネットワークを再設計する必要があります。基礎の選択におけるこれらの制限が、主要な深層学習タスク（分類など）に最適ではないモデルの重みにつながる可能性があることを実験的に示します。さらに、そのような固く焼き付けられた明示的な定式化は、異種の特徴グループを含む複合ネットワークを設計することを困難にする。このような問題を回避するために、一次損失と同変損失項を組み合わせた多目的損失関数を最適化することにより、標準CNNモデルのさまざまなレイヤーで同変を誘発するImplicitly Equivariant Networks（IEN）を提案します。 Rot-MNIST、Rot-TinyImageNet、Scale-MNIST、およびSTL-10データセットでのVGGおよびResNetモデルの実験を通じて、IENは、その単純な定式化でも、操作可能なネットワークよりも優れたパフォーマンスを発揮することを示しています。また、IENは、ベースラインと同等のパフォーマンスを維持しながら、CNNのチャネル数を30％以上削減できる異種フィルターグループの構築を容易にします。 IENの有効性は、視覚オブジェクト追跡の難しい問題でさらに検証されます。 IENが最先端の回転同変追跡法よりも優れている一方で、より高速な推論速度を提供することを示します。

Convolutional Neural Networks(CNN) are inherently equivariant under translations, however, they do not have an equivalent embedded mechanism to handle other transformations such as rotations and change in scale. Several approaches exist that make CNNs equivariant under other transformation groups by design. Among these, steerable CNNs have been especially effective. However, these approaches require redesigning standard networks with filters mapped from combinations of predefined basis involving complex analytical functions. We experimentally demonstrate that these restrictions in the choice of basis can lead to model weights that are sub-optimal for the primary deep learning task (e.g. classification). Moreover, such hard-baked explicit formulations make it difficult to design composite networks comprising heterogeneous feature groups. To circumvent such issues, we propose Implicitly Equivariant Networks (IEN) which induce equivariance in the different layers of a standard CNN model by optimizing a multi-objective loss function that combines the primary loss with an equivariance loss term. Through experiments with VGG and ResNet models on Rot-MNIST , Rot-TinyImageNet, Scale-MNIST and STL-10 datasets, we show that IEN, even with its simple formulation, performs better than steerable networks. Also, IEN facilitates construction of heterogeneous filter groups allowing reduction in number of channels in CNNs by a factor of over 30% while maintaining performance on par with baselines. The efficacy of IEN is further validated on the hard problem of visual object tracking. We show that IEN outperforms the state-of-the-art rotation equivariant tracking method while providing faster inference speed.

updated: Sun Nov 28 2021 14:44:17 GMT+0000 (UTC)

published: Sun Nov 28 2021 14:44:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト