A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Nikolay Dagaev; Brett D. Roads; Xiaoliang Luo; Daniel N. Barry; Kaustubh R. Patil; Bradley C. Love

ショートカットへの依存を減らす前に、あまりにも良いこと

標準的なテスト条件下でのオブジェクト認識やその他のタスクでの優れたパフォーマンスにもかかわらず、ディープネットワークは、多くの場合、非分散 (ood) サンプルに一般化できません。この欠点の 1 つの原因は、現代のアーキテクチャが「ショートカット」に依存する傾向があることです。これは、コンテキスト全体で保持されるより深い不変条件をキャプチャすることなく、カテゴリと相関する表面的な機能です。現実世界の概念は、多くの場合、コンテキストによって表面的に異なる可能性のある複雑な構造を持っているため、あるコンテキストでは最も直感的で有望なソリューションが他のコンテキストに一般化されない場合があります。 ood の一般化を改善するための 1 つの潜在的な方法は、単純な解決策がコンテキスト全体で有効である可能性は低いと想定し、それらを回避することです。浅いアーキテクチャの低容量ネットワーク (LCN) は、ショートカットを含む表面的な関係のみを学習できるはずです。 LCN はショートカット検出器として機能できることがわかりました。さらに、LCN の予測を 2 段階のアプローチで使用して、大容量ネットワーク (HCN) が広く一般化されるより深い不変機能に依存するように促すことができます。特に、LCN が習得できる項目は、HCN をトレーニングする際に軽量化されます。ショートカットを導入した CIFAR-10 データセットの修正バージョンを使用すると、2 段階の LCN-HCN アプローチによりショートカットへの依存が減り、大規模な一般化が容易になることがわかりました。

Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCN's predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.

updated: Tue Jun 08 2021 10:49:25 GMT+0000 (UTC)

published: Fri Feb 12 2021 09:17:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト