Robust Mixture-of-Expert Training for Convolutional Neural Networks

Yihua Zhang; Ruisi Cai; Tianlong Chen; Guanhua Zhang; Huan Zhang; Pin-Yu Chen; Shiyu Chang; Zhangyang Wang; Sijia Liu

畳み込みニューラルネットワークのための強力な専門家混合トレーニング

新しいディープモデルアーキテクチャであるスパースリーゲート Mixture of Expert (MoE) は、高精度で超効率的なモデル推論を可能にする大きな期待を実証しています。 MoE の人気が高まっているにもかかわらず、特に敵対的堅牢性の面で、畳み込みニューラルネットワーク (CNN) を進歩させる可能性を調査した研究はほとんどありませんでした。堅牢性の欠如が CNN の主な障害の 1 つになっているため、この論文では、CNN ベースの MoE モデルを敵対的に堅牢化するにはどうすればよいかを尋ねます。通常の CNN モデルのようにロバストにトレーニングできるでしょうか?私たちのパイロット研究では、従来の敵対的トレーニング (AT) メカニズム (バニラ CNN 用に開発された) はもはや MoE-CNN を強化するのに効果的ではないことが示されています。この現象をよりよく理解するために、MoE-CNN の堅牢性を 2 つの次元に分析します。ルーターの堅牢性 (つまり、データ固有のエキスパートを選択するためのゲート機能) とエキスパートの堅牢性 (つまり、サブネットワークによって定義されたルーター誘導パスウェイ) です。バックボーン CNN の）。私たちの分析によると、バニラ AT ではルーターとエキスパートが相互に適応するのは難しいことがわかりました。そこで、我々は、AdvMoE と呼ばれる、MoE 用の新しいルーターとエキスパートの交互の Adversarial トレーニングフレームワークを提案します。私たちの提案の有効性は、4 つのベンチマークデータセットにわたる 4 つの一般的に使用される CNN モデルアーキテクチャにわたって正当化されます。 AdvMoE は元の高密度 CNN と比較して 1% ～ 4% の敵対的堅牢性の向上を達成し、スパーシティゲート MoE の効率メリットを享受し、50% 以上の推論コスト削減につながることがわかりました。コードは https://github.com/OPTML-Group/Robust-MoE-CNN で入手できます。

Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model? Can we robustly train it like an ordinary CNN model? Our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. To better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: Robustness of routers (i.e., gating functions to select data-specific experts) and robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). Our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. The effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. We find that AdvMoE achieves 1% ~ 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50% inference cost reduction. Codes are available at https://github.com/OPTML-Group/Robust-MoE-CNN.

updated: Sat Aug 19 2023 20:58:21 GMT+0000 (UTC)

published: Sat Aug 19 2023 20:58:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト