CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

Aidan Boyd; Patrick Tinsley; Kevin Bowyer; Adam Czajka

CYBORG：人間の顕著性を損失にブレンドすることでディープラーニングが向上します

深層学習モデルは、トレーニングが人間の知覚能力を参照して導かれる場合、より一般化することができますか？そして、これを実際的な方法でどのように実装できますか？この論文は、一般化を高めるために脳の監視を促進するためのトレーニング戦略を提案します（CYBORG）。この新しいアプローチは、人間が注釈を付けた顕著性マップをCYBORG損失関数に組み込み、人間がタスクに対して顕著であると判断した画像領域からの特徴に向けてモデルの学習をガイドします。クラスアクティベーションマッピング（CAM）メカニズムは、各トレーニングバッチでモデルの現在の顕著性を調査し、このモデルの顕著性を人間の顕著性と並置し、大きな違いにペナルティを課すために使用されます。アプローチの有効性を説明するために選択された合成顔検出のタスクの結果は、CYBORGが、複数の分類ネットワークアーキテクチャにわたる6つの生成的敵対的ネットワークから生成された顔画像で構成される見えないサンプルの精度を大幅に向上させることを示しています。また、標準損失のトレーニングデータの7倍にスケーリングしても、CYBORGの精度に勝るものはないことも示しています。副作用として、合成顔検出のタスクに明示的な領域注釈を追加すると、人間の分類パフォーマンスが向上することがわかります。この作業は、実際に人間の視覚的顕著性を損失関数に組み込む方法に関する新しい研究領域を開きます。このホワイトペーパーでは、この作業で使用されるすべてのデータ、コード、および事前にトレーニングされたモデルが提供されています。

Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model's learning towards features from image regions that humans find salient for the task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose this model saliency with human saliency, and penalize large differences. Results on the task of synthetic face detection, selected to illustrate the effectiveness of the approach, show that CYBORG leads to significant improvement in accuracy on unseen samples consisting of face images generated from six Generative Adversarial Networks across multiple classification network architectures. We also show that scaling to even seven times the training data with standard loss cannot beat CYBORG accuracy. As a side effect, we observe that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions in practice. All data, code and pre-trained models used in this work are offered with this paper.

updated: Fri May 06 2022 17:24:57 GMT+0000 (UTC)

published: Wed Dec 01 2021 18:04:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト