ParaDiS: Parallelly Distributable Slimmable Neural Networks

Alexey Ozerov; Anne Lambert; Suresh Kirthi Kumaraswamy

ParaDiS：並列分散可能なスリム化可能なニューラルネットワーク

複数の限られた電力デバイスが利用可能な場合、処理の待ち時間と通信負荷を削減しながら、これらのリソースを活用する最も効率的な方法の1つは、複数のニューラルサブネットワークを並列に実行し、処理の最後に結果を融合することです。。ただし、このようなサブネットワークの組み合わせは、デバイスの特定の構成（デバイスの数とその容量によって特徴付けられる）ごとに特別にトレーニングする必要があります。これは、モデルの展開によって、さらには同じ展開内でも異なる場合があります。この作業では、再トレーニングなしでさまざまなデバイス構成間で並列に分割可能な並列分散可能なスリマブル（ParaDiS）ニューラルネットワークを紹介します。 ParaDiSネットワークは、1つのデバイス上のリソースに即座に適応できるスリム化可能なネットワークに触発されていますが、複数のマルチデバイス分散可能構成またはスイッチで構成されており、それらの間でパラメーターを強力に共有します。 MobileNet v1のParaDiSフレームワークと、ImageNet分類タスクのResNet-50アーキテクチャ、および画像超解像タスクのWDSRアーキテクチャを評価します。 ParaDiSスイッチが、個々のモデル、つまり、個別にトレーニングされた同じ構造の分散モデルと同等またはそれ以上の精度を達成することを示します。さらに、配布不可能な普遍的にスリム化可能なネットワークと比較して、配布可能なParaDiSスイッチの精度は、まったく低下しないか、最悪の場合にのみ最大1％低下することを示します。最後に、ParaDiSは、複数のデバイスに分散されると、非常にスリム化可能なモデルよりも優れたパフォーマンスを発揮します。

When several limited power devices are available, one of the most efficient ways to make profit of these resources, while reducing the processing latency and communication load, is to run in parallel several neural sub-networks and to fuse the result at the end of processing. However, such a combination of sub-networks must be trained specifically for each particular configuration of devices (characterized by number of devices and their capacities) which may vary over different model deployments and even within the same deployment. In this work we introduce parallelly distributable slimmable (ParaDiS) neural networks that are splittable in parallel among various device configurations without retraining. While inspired by slimmable networks allowing instant adaptation to resources on just one device, ParaDiS networks consist of several multi-device distributable configurations or switches that strongly share the parameters between them. We evaluate ParaDiS framework on MobileNet v1 and ResNet-50 architectures on ImageNet classification task and WDSR architecture for image super-resolution task. We show that ParaDiS switches achieve similar or better accuracy than the individual models, i.e., distributed models of the same structure trained individually. Moreover, we show that, as compared to universally slimmable networks that are not distributable, the accuracy of distributable ParaDiS switches either does not drop at all or drops by a maximum of 1 % only in the worst cases. Finally, once distributed over several devices, ParaDiS outperforms greatly slimmable models.

updated: Mon Nov 29 2021 09:50:25 GMT+0000 (UTC)

published: Wed Oct 06 2021 13:17:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト