Generalization to translation shifts: a study in architectures and augmentations

Suriya Gunasekar

翻訳シフトへの一般化: アーキテクチャと拡張に関する研究

空間翻訳不変性のために慎重に設計されたネットワークアーキテクチャの誘導バイアスをキャプチャする際に、データ拡張がどのように効果的であるかを研究します。さまざまな画像分類アーキテクチャ (アンチエイリアス、畳み込み、ビジョントランスフォーマー、および完全に接続された MLP ネットワーク) と、大きな変換シフトへの一般化に向けたデータ拡張技術を評価します。 (a) データ拡張がない場合、アンチエイリアス修正を伴う畳み込みネットワークを含むすべてのアーキテクチャは、翻訳されたテスト配布で評価すると、パフォーマンスがいくらか低下します。当然のことながら、非畳み込みモデルでは、分布内の精度とシフトへの劣化の両方が大幅に悪化します。 (b) パフォーマンスの堅牢性は、すべてのアーキテクチャで 4 ピクセルのランダムクロップの最小限の増強によっても改善されます。場合によっては、1 ～ 2 ピクセルのランダムトリミングでも十分です。これは、拡張によるメタ一般化の形式があることを示唆しています。非畳み込みアーキテクチャの場合、この基本的な拡張では絶対精度は依然として低いですが、変換シフトに対するロバスト性が大幅に改善されています。 (c) 十分に高度な拡張パイプライン (4 ピクセルのクロップ + RandAugmentation + Erasing + MixUp) を使用すると、すべてのアーキテクチャをトレーニングして、分布内の精度と大きな翻訳シフトへの一般化に関して競争力のあるパフォーマンスを得ることができます。

We study how effective data augmentation is at capturing the inductive bias of carefully designed network architectures for spatial translation invariance. We evaluate various image classification architectures (antialiased, convolutional, vision transformer, and fully connected MLP networks) and data augmentation techniques towards generalization to large translation shifts. We observe that: (a) without data augmentation, all architectures, including convolutional networks with antialiased modification suffer some degradation in performance when evaluated on translated test distributions. Understandably, both the in-distribution accuracy and degradation to shifts is significantly worse for non-convolutional models. (b) The robustness of performance is improved by even a minimal augmentation of 4 pixel random crop across all architectures. In some instances, even 1-2 pixel random crop is sufficient. This suggests that there is a form of meta generalization from augmentation. For non-convolutional architectures, while the absolute accuracy is still low with this basic augmentation, we see substantial improvements in robustness to translation shifts. (c) With a sufficiently advanced augmentation pipeline (4 pixel crop+RandAugmentation+Erasing+MixUp), all architectures can be trained to have competitive performance in terms of in-distribution accuracy as well as generalization to large translation shifts.

updated: Sat Nov 12 2022 23:00:07 GMT+0000 (UTC)

published: Tue Jul 05 2022 22:52:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト