DepGraph: Towards Any Structural Pruning

Gongfan Fang; Xinyin Ma; Mingli Song; Michael Bi Mi; Xinchao Wang

DepGraph: あらゆる構造的枝刈りに向けて

構造プルーニングは、ニューラルネットワークから構造的にグループ化されたパラメーターを削除することで、モデルの高速化を可能にします。ただし、パラメーターのグループ化パターンはモデルによって大きく異なり、手動で設計されたグループ化スキームに依存するアーキテクチャ固有のプルーナーを新しいアーキテクチャに一般化できません。この作業では、CNN、RNN、GNN、トランスフォーマーなどの任意のアーキテクチャの一般的な構造プルーニングに取り組むために、非常に挑戦的でありながらほとんど調査されていないタスクである構造プルーニングを研究します。この目標に対する最も顕著な障害は、構造的な結合にあります。これは、異なるレイヤーを同時にプルーニングすることを強制するだけでなく、削除されたすべてのパラメーターが一貫して重要ではないことを期待するため、プルーニング後の構造的な問題と大幅なパフォーマンスの低下を回避します。この問題に対処するために、レイヤー間の依存関係を明示的にモデル化し、プルーニングのために結合パラメーターを包括的にグループ化する、一般的で完全に自動化された方法である Dependency Graph (DepGraph) を提案します。この作業では、ResNe(X)t、DenseNet、MobileNet、画像用の Vision Transformer、グラフ用の GAT、3D ポイントクラウド用の DGCNN、言語用の LSTM など、いくつかのアーキテクチャとタスクでメソッドを広く評価し、それを実証します。単純なノルムベースの基準を使用しても、提案された方法は一貫して満足のいくパフォーマンスをもたらします。

Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architectures. In this work, we study a highly-challenging yet barely-explored task, any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. The most prominent obstacle towards this goal lies in the structural coupling, which not only forces different layers to be pruned simultaneously, but also expects all removed parameters to be consistently unimportant, thereby avoiding structural issues and significant performance degradation after pruning. To address this problem, we propose a general and fully automatic method, Dependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group coupled parameters for pruning. In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a simple norm-based criterion, the proposed method consistently yields gratifying performances.

updated: Thu Mar 23 2023 12:55:02 GMT+0000 (UTC)

published: Mon Jan 30 2023 14:02:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト