CYBORG: Blending Human Saliency Into the Loss Improves Deep Learning

Aidan Boyd; Patrick Tinsley; Kevin Bowyer; Adam Czajka

CYBORG: 人間の顕著性を損失にブレンドすることでディープラーニングが改善される

深層学習モデルのトレーニングが人間の知覚能力を参照して導かれる場合、深層学習モデルはより大きな一般化を達成できますか?そして、これを実際にどのように実装できるのでしょうか?この論文では、ConveY Brain Oversight to Raise Generalization (CYBORG) へのトレーニング戦略を提案します。この新しいアプローチは、人間が注釈を付けた顕著性マップを損失関数に組み込み、モデルの学習をガイドして、人間がタスクにとって顕著であると見なす画像領域に焦点を当てます。クラスアクティベーションマッピング (CAM) メカニズムを使用して、各トレーニングバッチでモデルの現在の顕著性を調査し、このモデルの顕著性を人間の顕著性と並置し、大きな違いにペナルティを課します。アプローチの有効性を説明するために選択された合成顔検出タスクの結果は、CYBORG が、複数の分類ネットワークアーキテクチャにわたる 6 つの敵対的生成ネットワークから生成された顔画像で構成される目に見えないサンプルの精度を大幅に向上させることを示しています。また、トレーニングデータを 7 倍にスケーリングしたり、セグメンテーションマスクなどの非人間的特徴補助情報を使用したり、標準損失を使用したりしても、CYBORG でトレーニングされたモデルのパフォーマンスを超えることはできないことも示しています。この作業の副作用として、合成顔検出のタスクに明示的な領域注釈を追加すると、人間の分類精度が向上することがわかりました。この作業は、実際に人間の視覚的顕著性を損失関数に組み込む方法に関する新しい研究領域を開きます。この作業で使用されるすべてのデータ、コード、事前トレーニング済みモデルは、このホワイトペーパーで提供されます。

Can deep learning models achieve greater generalization if their training is guided by reference to human perceptual abilities? And how can we implement this in a practical manner? This paper proposes a training strategy to ConveY Brain Oversight to Raise Generalization (CYBORG). This new approach incorporates human-annotated saliency maps into a loss function that guides the model's learning to focus on image regions that humans deem salient for the task. The Class Activation Mapping (CAM) mechanism is used to probe the model's current saliency in each training batch, juxtapose this model saliency with human saliency, and penalize large differences. Results on the task of synthetic face detection, selected to illustrate the effectiveness of the approach, show that CYBORG leads to significant improvement in accuracy on unseen samples consisting of face images generated from six Generative Adversarial Networks across multiple classification network architectures. We also show that scaling to even seven times the training data, or using non-human-saliency auxiliary information, such as segmentation masks, and standard loss cannot beat the performance of CYBORG-trained models. As a side effect of this work, we observe that the addition of explicit region annotation to the task of synthetic face detection increased human classification accuracy. This work opens a new area of research on how to incorporate human visual saliency into loss functions in practice. All data, code and pre-trained models used in this work are offered with this paper.

updated: Wed Aug 17 2022 20:54:54 GMT+0000 (UTC)

published: Wed Dec 01 2021 18:04:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト