Automatic Concept Extraction for Concept Bottleneck-based Video Classification

Jeya Vikranth Jeyakumar; Luke Dickens; Luis Garcia; Yu-Hsi Cheng; Diego Ramirez Echavarria; Joseph Noor; Alessandra Russo; Lance Kaplan; Erik Blasch; Mani Srivastava

コンセプトボトルネックベースのビデオ分類のための自動コンセプト抽出

解釈可能な深層学習モデルにおける最近の取り組みは、概念ベースの説明方法が標準のエンドツーエンドモデルとの競争力のある精度を達成し、画像から抽出された高レベルの視覚的概念についての推論と介入を可能にすることを示しています。鳥種分類用。ただし、これらの概念のボトルネックモデルは、事前定義された概念の必要十分なセットに依存しています。これは、ビデオ分類などの複雑なタスクでは扱いにくいものです。複雑なタスクの場合、ラベルと視覚要素間の関係は多くのフレームにまたがります。たとえば、飛んでいる鳥を識別したり、獲物を捕まえたりする必要があり、さまざまなレベルの抽象化を備えた概念が必要です。この目的のために、概念ベースのビデオ分類に必要かつ十分な概念抽象化のセットを厳密に構成する自動概念発見および抽出モジュールであるCoDExを紹介します。 CoDExは、ビデオの自然言語の説明から複雑な概念の抽象化の豊富なセットを識別します。これにより、無定形の概念のセットを事前に定義する必要がなくなります。メソッドの実行可能性を示すために、既存の複雑なビデオ分類データセットと、クラウドソーシングによる短いラベルの自然言語の説明を組み合わせた2つの新しい公開データセットを構築します。私たちの方法は、自然言語で固有の複雑な概念の抽象化を引き出し、概念のボトルネックの方法を複雑なタスクに一般化します。

Recent efforts in interpretable deep learning models have shown that concept-based explanation methods achieve competitive accuracy with standard end-to-end models and enable reasoning and intervention about extracted high-level visual concepts from images, e.g., identifying the wing color and beak length for bird-species classification. However, these concept bottleneck models rely on a necessary and sufficient set of predefined concepts-which is intractable for complex tasks such as video classification. For complex tasks, the labels and the relationship between visual elements span many frames, e.g., identifying a bird flying or catching prey-necessitating concepts with various levels of abstraction. To this end, we present CoDEx, an automatic Concept Discovery and Extraction module that rigorously composes a necessary and sufficient set of concept abstractions for concept-based video classification. CoDEx identifies a rich set of complex concept abstractions from natural language explanations of videos-obviating the need to predefine the amorphous set of concepts. To demonstrate our method's viability, we construct two new public datasets that combine existing complex video classification datasets with short, crowd-sourced natural language explanations for their labels. Our method elicits inherent complex concept abstractions in natural language to generalize concept-bottleneck methods to complex tasks.

updated: Tue Jun 21 2022 06:22:35 GMT+0000 (UTC)

published: Tue Jun 21 2022 06:22:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト