Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

Jiasen Lu; Christopher Clark; Rowan Zellers; Roozbeh Mottaghi; Aniruddha Kembhavi

Unified-IO: ビジョン、言語、マルチモーダルタスクの統合モデル

ポーズ推定、物体検出、深度推定、画像生成、領域キャプションや参照表現などの視覚と言語のタスクを含む、古典的なコンピュータービジョンタスクにまたがる多種多様な AI タスクを実行するモデルである Unified-IO を提案します。質問応答や言い換えなどの自然言語処理タスク。 RGB 画像、ピクセルごとのマップ、バイナリマスク、バウンディングボックス、言語など、各タスクに関連する異種の入力と出力が原因で、このように多種多様なタスクに対して単一の統合モデルを開発することは、独特の課題をもたらします。サポートされているすべての入力と出力を個別の語彙トークンのシーケンスに均質化することで、この統合を実現します。すべてのタスクに共通するこの表現により、視覚と言語の分野における 90 を超える多様なデータセットを組み合わせて、単一のトランスフォーマーベースのアーキテクチャをトレーニングすることができます。 Unified-IO は、GRIT ベンチマークで 7 つのタスクすべてを実行できる最初のモデルであり、NYUv2-Depth、ImageNet、VQA2.0、OK-VQA、Swig、VizWizGround、BoolQ、SciTail などの 16 の多様なベンチマークで強力な結果を生み出しています。タスク固有の微調整はありません。 Unified-IO のコードとデモは、https://unified-io.allenai.org で入手できます。

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for such a large variety of tasks poses unique challenges due to the heterogeneous inputs and outputs pertaining to each task, including RGB images, per-pixel maps, binary masks, bounding boxes, and language. We achieve this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. This common representation across all tasks allows us to train a single transformer-based architecture, jointly on over 90 diverse datasets in the vision and language fields. Unified-IO is the first model capable of performing all 7 tasks on the GRIT benchmark and produces strong results across 16 diverse benchmarks like NYUv2-Depth, ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with no task-specific fine-tuning. Code and demos for Unified-IO are available at: https://unified-io.allenai.org.

updated: Tue Oct 04 2022 22:37:32 GMT+0000 (UTC)

published: Fri Jun 17 2022 17:53:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト