Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Jinguo Zhu; Xizhou Zhu; Wenhai Wang; Xiaohua Wang; Hongsheng Li; Xiaogang Wang; Jifeng Dai

Uni-Perceiver-MoE：条件付きMoEを使用したスパースジェネラリストモデルの学習

生物学的知能システムのような人工ニューラルネットワークを構築するために、最近の研究では、多数のタスクをジェネラリストモデルに統合しました。このモデルは、共有パラメーターを使用してさまざまなタスクを処理でき、タスク固有のモジュールはありません。ジェネラリストモデルはさまざまなベンチマークで有望な結果を達成しますが、タスクに特化したモデルと比較して、一部のタスクではパフォーマンスが低下します。この作業では、さまざまなタスクとモダリティ間の干渉がこの現象の主な要因であることがわかります。このような干渉を軽減するために、ジェネラリストモデルにConditional Mixture-of-Experts（Conditional MoEs）を導入します。トレーニング/推論コストと一般化能力の両方を考慮に入れるために、さまざまなレベルの条件下でのルーティング戦略が提案されています。提案された条件付きMoEを組み込むことにより、最近提案されたジェネラリストモデルUni-Perceiverは、タスクとモダリティ間の干渉を効果的に軽減し、ダウンストリームデータの1％を迅速に調整することで、一連のダウンストリームタスクで最先端の結果を実現できます。。さらに、条件付きMoEの導入により、ビデオテキスト検索やビデオキャプションなどの新しいタスクでゼロショット推論を実行するジェネラリストモデルの一般化機能が引き続き保持されます。コードと事前に訓練されたジェネラリストモデルがリリースされます。

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., video-text retrieval and video caption. Code and pre-trained generalist models shall be released.

updated: Tue Jul 05 2022 07:56:01 GMT+0000 (UTC)

published: Thu Jun 09 2022 17:59:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト