SemSup: Semantic Supervision for Simple and Scalable Zero-shot Generalization

Austin W. Hanjie; Ameet Deshpande; Karthik Narasimhan

SemSup: シンプルでスケーラブルなゼロショット一般化のためのセマンティック監視

ゼロショット学習は、トレーニング中に見られないクラスのインスタンスを予測する問題です。ゼロショット学習へのアプローチの 1 つは、モデルに補助的なクラス情報を提供することです。この流れに沿った以前の作業では、主にインスタンスごとの注釈または単一のクラスレベルの説明を高価に使用していましたが、インスタンスごとの説明はスケーリングが難しく、単一のクラスの説明は十分に充実していない可能性があります。さらに、これらの作品は、自然言語による記述、単純なバイエンコーダーモデル、およびモダリティまたはタスク固有の方法のみを使用しています。これらのアプローチにはいくつかの制限があります。テキスト監視が常に利用可能または最適であるとは限らず、バイエンコーダーは入力とクラス記述の間の大まかな関係しか学習しない可能性があります。この作業では、(1) 単一の説明よりもパフォーマンスを向上させるスケーラブルな複数の説明のサンプリング方法、(2) 生成が容易で特定の設定でテキストよりも優れた JSON などの代替の説明形式を使用する新しいアプローチである SemSup を紹介します。 (3) クラスの説明できめ細かい情報を活用するための、語彙と意味のハイブリッドな類似性。 4 つのデータセット、2 つのモダリティ、および 3 つの一般化設定にわたる SemSup の有効性を示します。たとえば、テキストと画像のデータセット全体で、SemSup は見えないクラスの汎化精度を、最も近いベースラインと比較して平均で 15 ポイント向上させます。

Zero-shot learning is the problem of predicting instances over classes not seen during training. One approach to zero-shot learning is providing auxiliary class information to the model. Prior works along this vein have largely used expensive per-instance annotation or singular class-level descriptions, but per-instance descriptions are hard to scale and single class descriptions may not be rich enough. Furthermore, these works have used natural-language descriptions exclusively, simple biencoders models, and modality or task specific methods. These approaches have several limitations: text supervision may not always be available or optimal and biencoders may only learn coarse relations between inputs and class descriptions. In this work, we present SemSup, a novel approach that uses (1) a scalable multiple description sampling method which improves performance over single descriptions, (2) alternative description formats such as JSON that are easy to generate and outperform text on certain settings, and (3) hybrid lexical-semantic similarity to leverage fine-grained information in class descriptions. We demonstrate the effectiveness of SemSup across four datasets, two modalities, and three generalization settings. For example, across text and image datasets, SemSup increases unseen class generalization accuracy by 15 points on average compared to the closest baseline.

updated: Wed Jan 11 2023 14:48:12 GMT+0000 (UTC)

published: Sat Feb 26 2022 09:55:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト