Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Jiasen Lu; Christopher Clark; Rowan Zellers; Roozbeh Mottaghi; Aniruddha Kembhavi

Unified-IO：ビジョン、言語、およびマルチモーダルタスクの統合モデル

ポーズ推定、オブジェクト検出、深度推定、画像生成、領域キャプションや参照表現理解などの視覚と言語のタスクなど、従来のコンピュータビジョンタスクにまたがる多種多様なAIタスクを実行するモデルであるUnified-IOを提案します。質問応答や言い換えなどの自然言語処理タスクに。このように多種多様なタスク用に単一の統合モデルを開発すると、RGB画像、ピクセルごとのマップ、バイナリマスク、バウンディングボックス、言語など、各タスクに関連する入力と出力が不均一になるため、固有の課題が発生します。サポートされているすべての入力と出力を一連の個別の語彙トークンに均質化することで、この統合を実現します。すべてのタスクに共通するこの表現により、ビジョンと言語の分野で80を超える多様なデータセットを共同で使用して、単一のトランスフォーマーベースのアーキテクチャをトレーニングできます。 Unified-IOは、GRITベンチマークで7つのタスクすべてを実行できる最初のモデルであり、NYUv2-Depth、ImageNet、VQA2.0、OK-VQA、Swig、VizWizGround、BoolQ、SciTailなどの16の多様なベンチマークで強力な結果を生成します。タスクやベンチマーク固有の微調整はありません。 Unified-IOのデモは、https：//unified-io.allenai.orgで入手できます。

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression comprehension, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for such a large variety of tasks poses unique challenges due to the heterogeneous inputs and outputs pertaining to each task, including RGB images, per-pixel maps, binary masks, bounding boxes, and language. We achieve this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. This common representation across all tasks allows us to train a single transformer-based architecture, jointly on over 80 diverse datasets in the vision and language fields. Unified-IO is the first model capable of performing all 7 tasks on the GRIT benchmark and produces strong results across 16 diverse benchmarks like NYUv2-Depth, ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with no task or benchmark specific fine-tuning. Demos for Unified-IO are available at https://unified-io.allenai.org.

updated: Fri Jun 17 2022 17:53:47 GMT+0000 (UTC)

published: Fri Jun 17 2022 17:53:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト