AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Nicholas Roberts; Xintong Li; Tzu-Heng Huang; Dyah Adila; Spencer Schoenberg; Cheng-Yu Liu; Lauren Pick; Haotian Ma; Aws Albarghouthi; Frederic Sala

AutoWS-Bench-101: 100 個のラベルを使用した自動化された弱い監視のベンチマーク

弱い監視 (WS) は、ラベル付きデータがほとんどまたはまったくない場合に、教師付きモデルをトレーニングするためのラベル付きデータセットを構築するための強力な方法です。手作業によるラベル付けデータを、ラベル付け関数 (LF) によって表現される複数のノイズが多いが安価なラベル推定値の集約に置き換えます。多くのドメインでうまく使用されていますが、複雑なまたは高次元の機能を持つドメインのラベル付け関数を構築することの難しさによって、弱い監督の適用範囲が制限されます。これに対処するために、少数の方法で、グラウンドトゥルースラベルの小さなセットを使用して LF 設計プロセスを自動化することが提案されています。この作業では、AutoWS-Bench-101 を紹介します。これは、挑戦的な WS 設定で自動化された WS (AutoWS) 手法を評価するためのフレームワークです。これは、従来の WS 手法を適用することが以前は困難または不可能であった多様なアプリケーションドメインのセットです。 AutoWS は WS の適用範囲を拡大するための有望な方向性ですが、ゼロショット基礎モデルなどの強力なメソッドの出現により、AutoWS テクニックが最新のゼロショットまたは少数ショット学習器とどのように比較または連携するかを理解する必要があることが明らかになりました。これは、AutoWS-Bench-101 の中心的な質問を通知します。タスクごとに 100 個のラベルの初期セットが与えられた場合、実践者が AutoWS メソッドを使用して追加のラベルを生成するか、またはゼロショット予測などのより単純なベースラインを使用するかを尋ねます。基礎モデルまたは教師あり学習。多くの設定では、AutoWS メソッドが単純な少数ショットのベースラインよりも優れている場合、基礎モデルからの信号を組み込む必要があることがわかり、AutoWS-Bench-101 はこの方向での将来の研究を促進します。 AutoWS メソッドの徹底的なアブレーション研究で締めくくります。

Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods.

updated: Tue Aug 30 2022 16:09:42 GMT+0000 (UTC)

published: Tue Aug 30 2022 16:09:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト