Fire Together Wire Together: A Dynamic Pruning Approach with Self-Supervised Mask Prediction

Sara Elkerdawy; Mostafa Elhoushi; Hong Zhang; Nilanjan Ray

Fire Together Wire Together：自己監視マスク予測による動的プルーニングアプローチ

動的モデルプルーニングは、展開中に入力サンプルごとに異なるサブネットワークの推論を可能にする最近の方向性です。ただし、現在の動的な方法は、スパース性の損失を誘発することにより、正則化を通じて連続チャネルゲーティングを学習することに依存しています。この定式化により、さまざまな損失（タスクの損失、正則化の損失など）のバランスをとるのが複雑になります。さらに、正則化ベースの方法には、計算バジェットを実現するための透過的なトレードオフハイパーパラメータの選択がありません。私たちの貢献は2つあります：1）分離されたタスクと剪定トレーニング。 2）トレーニング前のFLOP削減推定を可能にする単純なハイパーパラメータ選択。前のレイヤーのアクティブ化に基づいて、レイヤー内のk個のフィルターを処理するマスクを予測することを提案します。この問題は、自己監視型のバイナリ分類問題として提起されます。各マスク予測モジュールは、現在のレイヤーの各フィルターの対数尤度が上位k個のアクティブ化されたフィルターに属するかどうかを予測するようにトレーニングされています。値kは、ヒートマップの質量を使用した新しい基準に基づいて、入力ごとに動的に推定されます。 CIFARおよびImageNetデータセットでのVGG、ResNet、MobileNetなどのいくつかのニューラルアーキテクチャでの実験を示します。 CIFARでは、SOTAメソッドと同様の精度に達し、FLOPが15％および24％削減されます。同様に、ImageNetでは、精度の低下を抑え、FLOPの削減を最大13％向上させます。

Dynamic model pruning is a recent direction that allows for the inference of a different sub-network for each input sample during deployment. However, current dynamic methods rely on learning a continuous channel gating through regularization by inducing sparsity loss. This formulation introduces complexity in balancing different losses (e.g task loss, regularization loss). In addition, regularization-based methods lack transparent tradeoff hyperparameter selection to realize computational budget. Our contribution is twofold: 1) decoupled task and pruning training. 2) Simple hyperparameter selection that enables FLOPs reduction estimation before training. We propose to predict a mask to process k filters in a layer based on the activation of its previous layer. We pose the problem as a self-supervised binary classification problem. Each mask predictor module is trained to predict if the log-likelihood of each filter in the current layer belongs to the top-k activated filters. The value k is dynamically estimated for each input based on a novel criterion using the mass of heatmaps. We show experiments on several neural architectures, such as VGG, ResNet, and MobileNet on CIFAR and ImageNet datasets. On CIFAR, we reach similar accuracy to SOTA methods with 15% and 24% higher FLOPs reduction. Similarly in ImageNet, we achieve a lower drop in accuracy with up to 13% improvement in FLOPs reduction.

updated: Fri Oct 15 2021 17:39:53 GMT+0000 (UTC)

published: Fri Oct 15 2021 17:39:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト