Visualizing high-dimensional loss landscapes with Hessian directions

Lucas Böttcher; Gregory Wheeler

Hessian 方向を使用した高次元損失ランドスケープの視覚化

高次元の損失関数の幾何学的特性 (局所的な曲率や、損失空間の特定のポイント周辺の他の最適値の存在など) を分析すると、ニューラルネットワーク構造、実装属性、および学習パフォーマンスの間の相互作用をよりよく理解するのに役立ちます。この作業では、高次元の確率と微分幾何学の概念を組み合わせて、低次元の損失表現の曲率特性が元の損失空間の曲率特性にどのように依存するかを調べます。ランダムな射影が使用されている場合、元の空間の鞍点が低次元表現でそのように正しく識別されることはめったにないことを示します。このような投影では、低次元表現で予想される曲率は、元の損失空間の平均曲率に比例します。したがって、元の損失空間の平均曲率によって、鞍点が平均して最小領域、最大領域、またはほぼ平坦な領域として表示されるかどうかが決まります。予想される曲率と平均曲率 (すなわち、正規化されたヘッセ行列) の間の接続を使用して、ハッチンソンの方法のようにヘッセ行列またはヘッセ行列ベクトルの積を計算することなく、ヘッセ行列の追跡を推定します。ランダムな投影はサドル情報を正しく識別することができないため、最大および最小の主曲率に関連付けられているヘッセ方向に沿った投影を調べることを提案します。私たちの調査結果を、損失の状況の平坦性と一般化可能性に関する進行中の議論に結び付けます。最後に、最大約 7×10^6 のパラメーターを持つさまざまな画像分類器での数値実験で、この方法を説明します。

Analyzing geometric properties of high-dimensional loss functions, such as local curvature and the existence of other optima around a certain point in loss space, can help provide a better understanding of the interplay between neural network structure, implementation attributes, and learning performance. In this work, we combine concepts from high-dimensional probability and differential geometry to study how curvature properties in lower-dimensional loss representations depend on those in the original loss space. We show that saddle points in the original space are rarely correctly identified as such in lower-dimensional representations if random projections are used. In such projections, the expected curvature in a lower-dimensional representation is proportional to the mean curvature in the original loss space. Hence, the mean curvature in the original loss space determines if saddle points appear, on average, as either minima, maxima, or almost flat regions. We use the connection between expected curvature and mean curvature (i.e., the normalized Hessian trace) to estimate the trace of Hessians without calculating the Hessian or Hessian-vector products as in Hutchinson's method. Because random projections are not able to correctly identify saddle information, we propose to study projections along Hessian directions that are associated with the largest and smallest principal curvatures. We connect our findings to the ongoing debate on loss landscape flatness and generalizability. Finally, we illustrate our method in numerical experiments on different image classifiers with up to about 7×10^6 parameters.

updated: Sun Aug 28 2022 13:18:47 GMT+0000 (UTC)

published: Sun Aug 28 2022 13:18:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト