Multimodal Prompting with Missing Modalities for Visual Recognition

Yi-Lun Lee; Yi-Hsuan Tsai; Wei-Chen Chiu; Chen-Yu Lee

視覚認識のためのモダリティが欠落しているマルチモーダルプロンプト

このホワイトペーパーでは、視覚認識のためのマルチモーダル学習における 2 つの課題に取り組みます。 2) 重い変圧器モデルを微調整するための計算リソースが利用できない場合。この目的のために、プロンプトラーニングを利用し、上記の 2 つの課題を一緒に軽減することを提案します。具体的には、モダリティ欠落認識プロンプトをマルチモーダルトランスフォーマーにプラグインして、モダリティ欠落の一般的なケースを処理できますが、モデル全体のトレーニングと比較して、必要な学習可能なパラメーターは 1% 未満です。さらに、さまざまなプロンプト構成の影響を調査し、欠落しているモダリティに対する堅牢性を分析します。大量のモデルの再トレーニングの要件を軽減しながら、さまざまなモダリティの欠落のケースでパフォーマンスを向上させる、迅速な学習フレームワークの有効性を示すために、広範な実験が行われます。コードが利用可能です。

In this paper, we tackle two challenges in multimodal learning for visual recognition: 1) when missing-modality occurs either during training or testing in real-world situations; and 2) when the computation resources are not available to finetune on heavy transformer models. To this end, we propose to utilize prompt learning and mitigate the above two challenges together. Specifically, our modality-missing-aware prompts can be plugged into multimodal transformers to handle general missing-modality cases, while only requiring less than 1% learnable parameters compared to training the entire model. We further explore the effect of different prompt configurations and analyze the robustness to missing modality. Extensive experiments are conducted to show the effectiveness of our prompt learning framework that improves the performance under various missing-modality cases, while alleviating the requirement of heavy model re-training. Code is available.

updated: Mon Mar 06 2023 18:54:46 GMT+0000 (UTC)

published: Mon Mar 06 2023 18:54:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト