Position, Padding and Predictions: A Deeper Look at Position Information in CNNs

Md Amirul Islam; Matthew Kowal; Sen Jia; Konstantinos G. Derpanis; Neil D. B. Bruce

位置、パディング、および予測：CNNの位置情報の詳細

完全に接続されたネットワークとは対照的に、畳み込みニューラルネットワーク（CNN）は、有限の空間範囲を持つローカルフィルターに関連付けられた重みを学習することによって効率を実現します。これは、フィルターが何を見ているかは認識しているが、画像内のどこに配置されているかは認識していない可能性があることを意味します。この論文では、最初にこの仮説をテストし、驚くべき程度の絶対位置情報が一般的に使用されるCNNでエンコードされていることを明らかにします。ゼロパディングはCNNを駆動して、内部表現で位置情報をエンコードしますが、パディングがないと位置エンコードが不可能になることを示します。これにより、CNNでの位置情報の役割についてより深い疑問が生じます。（i）どの境界ヒューリスティックがダウンストリームタスクの最適な位置エンコーディングを可能にしますか？（ii）位置エンコーディングは意味表現の学習に影響を与えますか？; （iii）位置エンコーディングは常にパフォーマンスを向上させますか？答えを提供するために、パディングとボーダーヒューリスティックがCNNで果たす役割についてこれまでで最大のケーススタディを実行します。境界までの距離の関数として境界効果を定量化できる新しいタスクを設計します。多数のセマンティック目標は、セマンティック表現に対する境界の影響を明らかにします。最後に、これらの調査結果が複数の実際のタスクに与える影響を示し、位置情報がパフォーマンスを向上または低下させる可能性があることを示します。

In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent. An implication of this is that a filter may know what it is looking at, but not where it is positioned in the image. In this paper, we first test this hypothesis and reveal that a surprising degree of absolute position information is encoded in commonly used CNNs. We show that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding. This gives rise to deeper questions about the role of position information in CNNs: (i) What boundary heuristics enable optimal position encoding for downstream tasks?; (ii) Does position encoding affect the learning of semantic representations?; (iii) Does position encoding always improve performance? To provide answers, we perform the largest case study to date on the role that padding and border heuristics play in CNNs. We design novel tasks which allow us to quantify boundary effects as a function of the distance to the border. Numerous semantic objectives reveal the effect of the border on semantic representations. Finally, we demonstrate the implications of these findings on multiple real-world tasks to show that position information can both help or hurt performance.

updated: Thu Jan 28 2021 23:40:32 GMT+0000 (UTC)

published: Thu Jan 28 2021 23:40:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト