LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Renrui Zhang; Jiaming Han; Chris Liu; Peng Gao; Aojun Zhou; Xiangfei Hu; Shilin Yan; Pan Lu; Hongsheng Li; Yu Qiao

LLaMA アダプター: ゼロ初期化アテンションによる言語モデルの効率的な微調整

我々は、LLaMA を命令追従モデルに効率的に微調整するための軽量の適応方法である LLaMA-Adapter を紹介します。 LLaMA アダプターは、52K の自己指示デモンストレーションを使用して、フリーズされた LLaMA 7B モデルに 120 万個の学習可能なパラメーターのみを導入し、8 つの A100 GPU での微調整にかかるコストは 1 時間未満です。具体的には、学習可能な適応プロンプトのセットを採用し、上位のトランスフォーマー層で単語トークンの先頭に追加します。次に、ゼロゲートを備えたゼロ初期化アテンションメカニズムが提案されます。これは、事前にトレーニングされた知識を効果的に保存しながら、新しい指導キューを LLaMA に適応的に注入します。効率的なトレーニングにより、LLaMA アダプターは、完全に微調整された 7B パラメーターを備えた Alpaca に匹敵する高品質の応答を生成できます。言語コマンドに加えて、私たちのアプローチは、画像条件付き LLaMA モデルを学習するためのマルチモーダル命令に簡単に拡張でき、ScienceQA および COCO Caption ベンチマークで優れた推論パフォーマンスを実現します。さらに、従来の視覚および言語タスクで他の事前トレーニング済みモデル (ViT、RoBERTa) を微調整するためのゼロ初期化アテンションメカニズムも評価し、私たちのアプローチの優れた一般化能力を実証します。コードは https://github.com/OpenGVLab/LLaMA-Adapter でリリースされています。

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

updated: Wed Jun 14 2023 17:31:32 GMT+0000 (UTC)

published: Tue Mar 28 2023 17:59:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト