Exploring Representational Alignment with Human Perception Using Identically Represented Inputs

Vedant Nanda; Ayan Majumdar; Camila Kolling; John P. Dickerson; Krishna P. Gummadi; Bradley C. Love; Adrian Weller

同一に表現された入力を使用した人間の知覚との表現的整合の調査

私たちは、学んだ表現の質の研究に貢献します。多くのドメインで、安全で信頼できる深層学習の重要な評価基準は、深層ニューラルネットワーク（DNN）の表現によってキャプチャされた不変性が人間とどれだけうまく共有されているかです。これらの不変性を測定する際の課題を特定します。以前の研究では、勾配ベースの方法を使用して、同一に表現された入力（IRI）、つまり、ニューラルネットワークの（特定のレイヤー上で）同様の表現を持つ入力を生成していました。これらのIRIが人間に「類似」しているように見える場合、ニューラルネットワークの学習された不変性は人間の知覚と一致していると言われます。ただし、DNNと人間の間の不変性の調整に関する以前の研究は、IRIの生成に使用される特定の損失関数によって「バイアス」されていることを示しています。さまざまな損失関数が、モデルの人間との共有不変性についてさまざまなポイントにつながる可能性があることを示します。敵対的なIRI生成プロセスでは、すべてのモデルが人間とほとんど共有されていないように見えることを示します。深層学習パイプラインのさまざまなコンポーネントが、人間の不変性とよく一致する学習モデルにどのように寄与するかについて、詳細な調査を実施します。 ℓ_pボールの敵対的なデータ拡張を伴う自己監視された対照的な損失を使用してトレーニングされた残留接続を持つアーキテクチャは、最も人間のような不変性を学習する傾向があることがわかります。

We contribute to the study of the quality of learned representations. In many domains, an important evaluation criterion for safe and trustworthy deep learning is how well the invariances captured by representations of deep neural networks (DNNs) are shared with humans. We identify challenges in measuring these invariances. Prior works used gradient-based methods to generate identically represented inputs (IRIs), i.e. , inputs which have similar representations (on a given layer) of a neural network. If these IRIs look `similar' to humans then a neural network's learned invariances are said to align with human perception. However, we show that prior studies on the alignment of invariances between DNNs and humans are `biased' by the specific loss function used to generate IRIs. We show how different loss functions can lead to different takeaways about a model's shared invariances with humans. We show that under an adversarial IRI~generation process all models appear to have very little shared invariance with humans. We conduct an in-depth investigation of how different components of the deep learning pipeline contribute to learning models that have good alignment with human's invariances. We find that architectures with residual connections trained using a self-supervised contrastive loss with ℓ_p ball adversarial data augmentation tend to learn the most human-like invariances.

updated: Mon May 30 2022 21:17:15 GMT+0000 (UTC)

published: Mon Nov 29 2021 17:26:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト