FC2T2: The Fast Continuous Convolutional Taylor Transform with Applications in Vision and Graphics

Henning Lange; J. Nathan Kutz

FC2T2：ビジョンとグラフィックスのアプリケーションを使用した高速連続畳み込みテイラー変換

シリーズの拡張は、何世紀にもわたって応用数学と工学の基礎でした。このホワイトペーパーでは、現代の機械学習の観点からテイラー級数展開を再検討します。具体的には、高速多重極法（FMM）の変形である高速連続畳み込みテイラー変換（FC2T2）を紹介します。これにより、連続空間での低次元畳み込み演算子の効率的な近似が可能になります。 N体問題の計算の複雑さをO（NM）からO（N + M）に減らす近似アルゴリズムであるFMMに基づいて構築し、粒子シミュレーションなどでのアプリケーションを見つけます。中間ステップとして、FMMはグリッド上のすべてのセルに対して級数展開を生成し、この表現に直接作用するアルゴリズムを導入します。これらのアルゴリズムは、分析的ですが、バックプロパゲーションアルゴリズムの順方向パスと逆方向パスに必要な量を概算で計算するため、ニューラルネットワークの（暗黙の）レイヤーとして使用できます。具体的には、表面法線とオブジェクト距離を出力するルート陰的レイヤーと、3Dポーズが与えられた場合の放射輝度フィールドのレンダリングを出力する積分陰的レイヤーを紹介します。機械学習のコンテキストでは、NとMはそれぞれモデルパラメータとモデル評価の数として理解できます。これは、通常のニューラルネットワークとは異なり、コンピュータビジョンとグラフィックスで普及している繰り返しの機能評価を必要とするアプリケーションの場合、テクニックを必要とします。このペーパーでは、パラメータを使用して優雅にスケールを紹介します。一部のアプリケーションでは、これにより、最先端のアプローチと比較してFLOPが200分の1に削減されますが、精度が合理的または存在しない場合に低下します。

Series expansions have been a cornerstone of applied mathematics and engineering for centuries. In this paper, we revisit the Taylor series expansion from a modern Machine Learning perspective. Specifically, we introduce the Fast Continuous Convolutional Taylor Transform (FC2T2), a variant of the Fast Multipole Method (FMM), that allows for the efficient approximation of low dimensional convolutional operators in continuous space. We build upon the FMM which is an approximate algorithm that reduces the computational complexity of N-body problems from O(NM) to O(N+M) and finds application in e.g. particle simulations. As an intermediary step, the FMM produces a series expansion for every cell on a grid and we introduce algorithms that act directly upon this representation. These algorithms analytically but approximately compute the quantities required for the forward and backward pass of the backpropagation algorithm and can therefore be employed as (implicit) layers in Neural Networks. Specifically, we introduce a root-implicit layer that outputs surface normals and object distances as well as an integral-implicit layer that outputs a rendering of a radiance field given a 3D pose. In the context of Machine Learning, N and M can be understood as the number of model parameters and model evaluations respectively which entails that, for applications that require repeated function evaluations which are prevalent in Computer Vision and Graphics, unlike regular Neural Networks, the techniques introduce in this paper scale gracefully with parameters. For some applications, this results in a 200x reduction in FLOPs compared to state-of-the-art approaches at a reasonable or non-existent loss in accuracy.

updated: Fri Oct 29 2021 22:58:42 GMT+0000 (UTC)

published: Fri Oct 29 2021 22:58:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト