Joint inference and input optimization in equilibrium networks

Swaminathan Gurumurthy; Shaojie Bai; Zachary Manchester; J. Zico Kolter

平衡ネットワークにおける共同推論と入力最適化

ディープラーニングの多くのタスクには、ネットワークへの入力を最適化して、目的を最小化または最大化することが含まれます。例としては、生成モデルの潜在空間を最適化してターゲット画像に一致させたり、入力を逆に摂動させて分類器のパフォーマンスを悪化させたりします。ただし、このような最適化を実行するには、勾配ステップごとにネットワークを完全に順方向および逆方向に通過する必要があるため、従来は非常にコストがかかります。別の一連の研究で、最近の研究スレッドは、深層平衡（DEQ）モデルを開発しました。これは、従来のネットワーク深度を無視し、代わりに単一の非線形層の固定点を見つけることによってネットワークの出力を計算するモデルのクラスです。この論文では、これら2つの設定の間に自然な相乗効果があることを示します。これらの最適化問題にDEQを単純に使用することはコストがかかりますが（各勾配ステップの固定小数点を計算するために必要な時間のため）、勾配ベースの最適化自体を固定小数点反復としてキャストして大幅に改善できるという事実を活用できます。全体的な速度。つまり、元のネットワークと最適化プロセスの両方を共同でエンコードする単一の「拡張」DEQモデル内で、DEQ固定小数点の解決とネットワーク入力の最適化の両方を同時に行います。実際、この手順は十分に高速であるため、従来は「内部」最適化ループに依存していたタスクのDEQモデルを効率的にトレーニングできます。潜在コードを最適化しながら生成モデルをトレーニングする、ノイズ除去や修復などの逆問題のモデルをトレーニングする、敵対的トレーニング、勾配ベースのメタ学習など、さまざまなタスクでこの戦略を示します。

Many tasks in deep learning involve optimizing over the inputs to a network to minimize or maximize some objective; examples include optimization over latent spaces in a generative model to match a target image, or adversarially perturbing an input to worsen classifier performance. Performing such optimization, however, is traditionally quite costly, as it involves a complete forward and backward pass through the network for each gradient step. In a separate line of work, a recent thread of research has developed the deep equilibrium (DEQ) model, a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. In this paper, we show that there is a natural synergy between these two settings. Although, naively using DEQs for these optimization problems is expensive (owing to the time needed to compute a fixed point for each gradient step), we can leverage the fact that gradient-based optimization can itself be cast as a fixed point iteration to substantially improve the overall speed. That is, we simultaneously both solve for the DEQ fixed point and optimize over network inputs, all within a single ``augmented'' DEQ model that jointly encodes both the original network and the optimization process. Indeed, the procedure is fast enough that it allows us to efficiently train DEQ models for tasks traditionally relying on an ``inner'' optimization loop. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.

updated: Thu Nov 25 2021 19:59:33 GMT+0000 (UTC)

published: Thu Nov 25 2021 19:59:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト