Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Swaroop Mishra; Daniel Khashabi; Chitta Baral; Hannaneh Hajishirzi

自然言語クラウドソーシング命令によるクロスタスクの一般化

人間（クラウドワーカーなど）は、タスクを定義するテキストの指示を読み、いくつかの例を見るだけで、さまざまなタスクを解決する優れた能力を備えています。個々のデータセットでの従来の教師あり学習の成功にもかかわらず、そのようなモデルは、タスク間の一般化に苦労することがよくあります（たとえば、質問応答システムは分類タスクを解決できません）。 AIの長年の課題は、新しいタスクを定義する人間が読める形式の命令を理解することによって、新しいタスクを学習するモデルを構築することです。これを研究するために、61の異なるタスクのデータセットであるNATURAL INSTRUCTIONS、それらの人間が作成した命令、および193kのタスクインスタンス（入出力ペア）を紹介します。命令は、既存のNLPデータセットを作成するために使用されるクラウドソーシング命令から取得され、統合スキーマにマップされます。このメタデータセットを使用して、表示されたタスクのモデルをトレーニングし、残りの未表示のタスクの一般化を測定することにより、タスク間の一般化を測定します。事前にトレーニングされた生成言語モデルを採用して、タスク固有の命令を入力とともにエンコードし、タスク出力を生成します。私たちの結果は、目に見えないタスクへの一般化の観点から評価すると、モデルが命令の恩恵を受けることを示しています（命令を利用するモデルの場合は19％優れています）。ただし、これらのモデルは、推定パフォーマンスの上限をはるかに下回っており、この方向にさらに進展する余地がかなりあることを示しています。

Humans (e.g., crowdworkers) have a remarkable ability in solving different tasks, by simply reading textual instructions that define them and looking at a few examples. Despite the success of the conventional supervised learning on individual datasets, such models often struggle with generalization across tasks (e.g., a question-answering system cannot solve classification tasks). A long-standing challenge in AI is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, we introduce NATURAL INSTRUCTIONS, a dataset of 61 distinct tasks, their human-authored instructions, and 193k task instances (input-output pairs). The instructions are obtained from crowdsourcing instructions used to create existing NLP datasets and mapped to a unified schema. Using this meta-dataset, we measure cross-task generalization by training models on seen tasks and measuring generalization to the remaining unseen ones. We adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Our results indicate that models benefit from instructions when evaluated in terms of generalization to unseen tasks (19% better for models utilizing instructions). These models, however, are far behind an estimated performance upperbound indicating significant room for more progress in this direction.

updated: Mon Mar 14 2022 09:15:08 GMT+0000 (UTC)

published: Sun Apr 18 2021 08:44:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト