VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation

Yanyuan Qiao; Zheng Yu; Qi Wu

VLN-PETL: 視覚と言語のナビゲーションのためのパラメータ効率の高い転移学習

視覚と言語のナビゲーション (VLN) タスクのパフォーマンスは、事前にトレーニングされた大規模な視覚と言語モデルの使用のおかげで、最近急速な進歩を遂げています。ただし、ダウンストリーム VLN タスクごとに事前トレーニングされたモデルを完全に微調整すると、モデルのサイズがかなり大きくなるため、コストが高くなりつつあります。最近の研究のホットスポットであるパラメータ効率転移学習 (PETL) は、一般的な CV および NLP タスク向けに大規模な事前トレーニング済みモデルを効率的に調整する上で大きな可能性を示しています。最小限のパラメーターのセット。ただし、より困難な VLN タスクに既存の PETL 手法を単純に利用すると、パフォーマンスに重大な低下が生じる可能性があります。したがって、我々は、VLN タスクの PETL 手法を探索する最初の研究を発表し、VLN-PETL と呼ばれる VLN 固有の PETL 手法を提案します。具体的には、Historical Interaction Booster (HIB) と Cross-modal Interaction Booster (CIB) という 2 つの PETL モジュールを設計します。次に、これら 2 つのモジュールをいくつかの既存の PETL メソッドと統合した VLN-PETL として組み合わせます。 4 つの主流 VLN タスク (R2R、REVERIE、NDH、RxR) に関する広範な実験結果は、私たちが提案する VLN-PETL の有効性を実証しています。VLN-PETL は完全な微調整と同等またはそれ以上のパフォーマンスを達成し、有望なマージンで他の PETL 手法を上回ります。。

The performance of the Vision-and-Language Navigation~(VLN) tasks has witnessed rapid progress recently thanks to the use of large pre-trained vision-and-language models. However, full fine-tuning the pre-trained model for every downstream VLN task is becoming costly due to the considerable model size. Recent research hotspot of Parameter-Efficient Transfer Learning (PETL) shows great potential in efficiently tuning large pre-trained models for the common CV and NLP tasks, which exploits the most of the representation knowledge implied in the pre-trained model while only tunes a minimal set of parameters. However, simply utilizing existing PETL methods for the more challenging VLN tasks may bring non-trivial degeneration to the performance. Therefore, we present the first study to explore PETL methods for VLN tasks and propose a VLN-specific PETL method named VLN-PETL. Specifically, we design two PETL modules: Historical Interaction Booster (HIB) and Cross-modal Interaction Booster (CIB). Then we combine these two modules with several existing PETL methods as the integrated VLN-PETL. Extensive experimental results on four mainstream VLN tasks (R2R, REVERIE, NDH, RxR) demonstrate the effectiveness of our proposed VLN-PETL, where VLN-PETL achieves comparable or even better performance to full fine-tuning and outperforms other PETL methods with promising margins.

updated: Sun Aug 20 2023 05:55:30 GMT+0000 (UTC)

published: Sun Aug 20 2023 05:55:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト