Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification

Yue Yang; Artemis Panagopoulou; Shenghao Zhou; Daniel Jin; Chris Callison-Burch; Mark Yatskar

瓶の中の言語: 言語モデルによるガイド付き概念解釈可能な画像分類のボトルネック

コンセプトボトルネックモデル (CBM) は、モデルの決定を人間が読めるコンセプトに分解する、本質的に解釈可能なモデルです。これにより、モデルが失敗する理由を簡単に理解できます。これは、リスクの高いアプリケーションにとって重要な機能です。 CBM は手動で指定された概念を必要とし、多くの場合、対応するブラックボックスよりもパフォーマンスが低く、広く採用されていません。これらの欠点に対処し、ブラックボックスモデルと同様の精度を手動で指定せずに高性能 CBM を構築する方法を最初に示します。私たちのアプローチ、Language Guided Bottlenecks (LaBo) は、言語モデル GPT-3 を活用して、潜在的なボトルネックの大きな領域を定義します。 LaBo は、問題のドメインが与えられると、GPT-3 を使用してカテゴリに関する事実に基づく文を生成し、候補となる概念を形成します。 LaBo は、差別的で多様な情報の選択を促進する新しいサブモジュラーユーティリティを通じて、潜在的なボトルネックを効率的に検索します。最終的に、GPT-3 の重要な概念は、CLIP を使用して画像に合わせて、ボトルネックレイヤーを形成することができます。実験は、LaBo が視覚認識にとって重要な概念の非常に効果的な事前設定であることを示しています。 11 の多様なデータセットを使用した評価では、LaBo ボトルネックは少数ショット分類に優れています。1 ショットでのブラックボックス線形プローブよりも 11.7% 正確であり、より多くのデータに匹敵します。全体として、LaBo は、本質的に解釈可能なモデルが、ブラックボックスアプローチと同等またはそれ以上のパフォーマンスで広く適用できることを示しています。

Concept Bottleneck Models (CBM) are inherently interpretable models that factor model decisions into human-readable concepts. They allow people to easily understand why a model is failing, a critical feature for high-stakes applications. CBMs require manually specified concepts and often under-perform their black box counterparts, preventing their broad adoption. We address these shortcomings and are first to show how to construct high-performance CBMs without manual specification of similar accuracy to black box models. Our approach, Language Guided Bottlenecks (LaBo), leverages a language model, GPT-3, to define a large space of possible bottlenecks. Given a problem domain, LaBo uses GPT-3 to produce factual sentences about categories to form candidate concepts. LaBo efficiently searches possible bottlenecks through a novel submodular utility that promotes the selection of discriminative and diverse information. Ultimately, GPT-3's sentential concepts can be aligned to images using CLIP, to form a bottleneck layer. Experiments demonstrate that LaBo is a highly effective prior for concepts important to visual recognition. In the evaluation with 11 diverse datasets, LaBo bottlenecks excel at few-shot classification: they are 11.7% more accurate than black box linear probes at 1 shot and comparable with more data. Overall, LaBo demonstrates that inherently interpretable models can be widely applied at similar, or better, performance than black box approaches.

updated: Mon Nov 21 2022 03:05:02 GMT+0000 (UTC)

published: Mon Nov 21 2022 03:05:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト