CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

Aidan Boyd; Patrick Tinsley; Kevin Bowyer; Adam Czajka

サイボーグ：人間の顕著性を損失にブレンドすると、ディープラーニングが向上します

深層学習モデルは、トレーニングが人間の知覚能力を参照して導かれる場合、より一般化することができますか？そして、これを実際的な方法でどのように実装できますか？この論文は、一般化を高めるために脳の監視を促進するための初めてのトレーニング戦略を提案します（CYBORG）。この新しいトレーニングアプローチは、人間が注釈を付けた顕著性マップをCYBORG損失関数に組み込み、特定の視覚的タスクを解決するときに人間が顕著であると感じる画像領域から特徴を学習するようにモデルを導きます。クラスアクティベーションマッピング（CAM）メカニズムは、各トレーニングバッチでモデルの現在の顕著性を調査し、モデルの顕著性を人間の顕著性と並置し、モデルに大きな違いがある場合にペナルティを課すために使用されます。合成顔検出のタスクの結果は、CYBORG損失が、複数の分類ネットワークアーキテクチャにわたる6つの生成的敵対的ネットワーク（GAN）から生成された顔画像で構成される見えないサンプルのパフォーマンスの大幅な向上につながることを示しています。また、標準損失のあるトレーニングデータの7倍にまでスケーリングしても、CYBORG損失の精度に勝るものはないことも示しています。副作用として、合成顔検出のタスクに明示的な領域注釈を追加すると、人間の分類パフォーマンスが向上することがわかりました。この作業は、人間の視覚的顕著性を損失関数に組み込む方法に関する新しい研究領域を開きます。このホワイトペーパーでは、この作業で使用されるすべてのデータ、コード、および事前トレーニング済みモデルが提供されています。

Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a first-ever training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new training approach incorporates human-annotated saliency maps into a CYBORG loss function that guides the model towards learning features from image regions that humans find salient when solving a given visual task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose model saliency with human saliency, and penalize the model for large differences. Results on the task of synthetic face detection show that the CYBORG loss leads to significant improvement in performance on unseen samples consisting of face images generated from six Generative Adversarial Networks (GANs) across multiple classification network architectures. We also show that scaling to even seven times as much training data with standard loss cannot beat the accuracy of CYBORG loss. As a side effect, we observed that the addition of explicit region annotation to the task of synthetic face detection increased human classification performance. This work opens a new area of research on how to incorporate human visual saliency into loss functions. All data, code and pre-trained models used in this work are offered with this paper.

updated: Wed Dec 01 2021 18:04:15 GMT+0000 (UTC)

published: Wed Dec 01 2021 18:04:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト