GIFT: Generative Interpretable Fine-Tuning
We present Generative Interpretable Fine-Tuning (GIFT) for parameter-efficient fine-tuning of pretrained Transformer backbones, which can be formulated as a simple factorized matrix multiplication in the parameter space or equivalently in the activation/representation space, and thus embraces built-in interpretability. For a layer with weights ω∈R^{d_out×d_in}, our proposed GIFT learns the fine-tuned weights ω directly from ω as ω=ω∙(I+ϕ_{d_in×r}∙ψ_{r×d_in}). Θ=(ϕ, ψ) are the learnable parameters of the two linear layers. Θ can be shared by all layers selected for fine-tuning (e.g., all the Query and Value layers), or can be layer-type specific (e.g., different Θ's used for Query and Value), resulting in significantly fewer trainable parameters compared to layer-specific Low-Rank Adaptation (LoRA). We perform comprehensive evaluations on natural language tasks (commonsense and arithmetic reasoning, instruction tuning, and sequence classification), and fine-grained visual classification tasks. We obtain the best performance and parameter efficiency among baselines on commonsense reasoning, instruction tuning and visual recognition benchmarks. Compared to LoRA, we obtain 5.9% absolute increase in average accuracy with 53.8 times reduction of parameters on Commonsense170k using Llama-3 (8B), and 5.4% absolute increase in the win rate with 4 times reduction of parameters using Llama-2 (7B) during instruction tuning. Our GIFT also obtains a slightly higher win rate on instruction tuning than GPT 3.5 (Turbo 1106). We show the output of the first linear layer (i.e., ω∙ϕ) is surprisingly interpretable, which can play the role of a token-clustering head as a by-product to localize meaningful objects/parts in images for computer vision tasks.
updated: Mon Jul 08 2024 01:59:10 GMT+0000 (UTC)
published: Fri Dec 01 2023 16:33:57 GMT+0000 (UTC)
