Δ-Networks for Efficient Model Patching

Chaitanya Devaguptapu; Samarth Sinha; K J Joseph; Vineeth N Balasubramanian; Animesh Garg

効率的なモデルパッチ適用のための Δ-ネットワーク

大規模なデータセットで事前トレーニングされたモデルは、時間の経過とともに到着する新しいタスクやデータセットをサポートするために微調整されることがよくあります。このプロセスでは、事前トレーニング済みのモデルが微調整される各タスクについて、モデルのコピーを長期にわたって保存する必要があります。最近のモデルパッチ作業に基づいて、モデルのコピーを保存する必要なく、効率的な方法でニューラルネットワークモデルを微調整するための Δ-パッチを提案します。この目的を達成するために、Δ-Networks と呼ばれるシンプルで軽量な方法を提案します。設定とアーキテクチャのバリアントにわたる包括的な実験により、Δ-Networks は以前のモデルのパッチ適用作業よりも優れており、トレーニングに必要なパラメーターはほんのわずかであることが示されています。また、このアプローチは、転移学習やゼロショットドメイン適応などの他の問題設定や、検出やセグメンテーションなどの他のタスクにも使用できることも示しています。

Models pre-trained on large-scale datasets are often finetuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is finetuned to. Building on top of recent model patching work, we propose Δ-Patching for finetuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called Δ-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that Δ-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.

updated: Sun Mar 26 2023 16:39:44 GMT+0000 (UTC)

published: Sun Mar 26 2023 16:39:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト