Constraining Representations Yields Models That Know What They Don't Know

Joao Monteiro; Pau Rodriguez; Pierre-Andre Noel; Issam Laradji; David Vazquez

表現の制約は、自分が知らないことを知っているモデルを生み出す

ニューラルネットワークのよく知られた失敗モードは、特にトレーニング分布と何らかの形で異なるデータに対して、信頼性の高い誤った予測に対応します。このような安全でない動作は、その適用性を制限します。それに対抗するために、内部表現に制約を追加することで、正確な信頼レベルを提供するモデルを定義できることを示します。つまり、クラスラベルを固定の一意のバイナリベクトルまたはクラスコードとしてエンコードし、それらを使用してモデル全体でクラス依存のアクティベーションパターンを適用します。結果として得られる予測子は総活性化分類子 (TAC) と呼ばれ、TAC は基本分類子への追加コンポーネントとして使用され、予測の信頼性を示します。データインスタンスが与えられると、TAC は中間表現を互いに素なセットにスライスし、そのようなスライスをスカラーに縮小して、アクティベーションプロファイルを生成します。トレーニング中、アクティベーションプロファイルは、特定のトレーニングインスタンスに割り当てられたコードにプッシュされます。テスト時に、例のアクティベーションプロファイルに最もよく一致するコードに対応するクラスを予測できます。経験的に、活性化パターンとそれに対応するコードの類似性により、判別信頼スコアを誘導するための安価な教師なしアプローチが得られることがわかります。つまり、TAC は、既存のモデルから抽出された最先端の信頼スコアと少なくとも同程度であると同時に、拒否設定でのモデルの値を厳密に改善していることを示しています。 TAC は、複数のタイプのアーキテクチャとデータモダリティでうまく機能することも観察されました。

A well-known failure mode of neural networks corresponds to high confidence erroneous predictions, especially for data that somehow differs from the training distribution. Such an unsafe behaviour limits their applicability. To counter that, we show that models offering accurate confidence levels can be defined via adding constraints in their internal representations. That is, we encode class labels as fixed unique binary vectors, or class codes, and use those to enforce class-dependent activation patterns throughout the model. Resulting predictors are dubbed Total Activation Classifiers (TAC), and TAC is used as an additional component to a base classifier to indicate how reliable a prediction is. Given a data instance, TAC slices intermediate representations into disjoint sets and reduces such slices into scalars, yielding activation profiles. During training, activation profiles are pushed towards the code assigned to a given training instance. At testing time, one can predict the class corresponding to the code that best matches the activation profile of an example. Empirically, we observe that the resemblance between activation patterns and their corresponding codes results in an inexpensive unsupervised approach for inducing discriminative confidence scores. Namely, we show that TAC is at least as good as state-of-the-art confidence scores extracted from existing models, while strictly improving the model's value on the rejection setting. TAC was also observed to work well on multiple types of architectures and data modalities.

updated: Tue Aug 30 2022 18:28:00 GMT+0000 (UTC)

published: Tue Aug 30 2022 18:28:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト