Warped Convolutions: Efficient Invariance to Spatial Transformations

João F. Henriques; Andrea Vedaldi

歪んだ畳み込み：空間変換に対する効率的な不変性

畳み込みニューラルネットワーク（CNN）は、自然画像に固有の並進不変を利用するため、非常に効率的です。ただし、翻訳は、無数の有用な空間変換の1つにすぎません。他の空間的不変性を考慮した場合、同じ効率を達成できますか？このような一般化された畳み込みは過去に検討されてきましたが、計算コストが高くなります。シンプルで正確でありながら、標準の畳み込みが享受するのと同じ計算の複雑さを持つ構造を提示します。これは、一定の画像ワープとそれに続く単純な畳み込みで構成されます。これは、深層学習ツールボックスの標準ブロックです。慎重に作成されたワープを使用すると、結果として得られるアーキテクチャを、さまざまな2パラメータの空間変換と同変にすることができます。 Google Earthデータセットでの車両のポーズの推定（回転とスケール）、野生の注釈付き顔のランドマークでの顔のポーズ（遠近法での3D回転）など、現実的なシナリオで有望な結果を示します。

Convolutional Neural Networks (CNNs) are extremely efficient, since they exploit the inherent translation-invariance of natural images. However, translation is just one of a myriad of useful spatial transformations. Can the same efficiency be attained when considering other spatial invariances? Such generalized convolutions have been considered in the past, but at a high computational cost. We present a construction that is simple and exact, yet has the same computational complexity that standard convolutions enjoy. It consists of a constant image warp followed by a simple convolution, which are standard blocks in deep learning toolboxes. With a carefully crafted warp, the resulting architecture can be made equivariant to a wide range of two-parameter spatial transformations. We show encouraging results in realistic scenarios, including the estimation of vehicle poses in the Google Earth dataset (rotation and scale), and face poses in Annotated Facial Landmarks in the Wild (3D rotations under perspective).

updated: Tue Nov 30 2021 21:14:07 GMT+0000 (UTC)

published: Wed Sep 14 2016 19:10:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト