DISCO: Distributed Inference with Sparse Communications

Minghai Qin; Chao Sun; Jaco Hofmann; Dejan Vucinic

DISCO: スパース通信による分散推論

ディープニューラルネットワーク (DNN) は、多くの現実世界の問題を解決する大きな可能性を秘めていますが、通常は大量の計算とメモリが必要です。大規模な DNN モデルを、メモリ容量が小さい、リソースが制限された単一のデバイスに展開することは非常に困難です。分散コンピューティングは、単一ノードのメモリ消費を削減し、DNN モデルの推論を高速化するための一般的なアプローチです。この論文では、各層の推論を複数のノードに分散する「層内モデル並列処理」について説明します。このようにして、メモリ要件を多くのノードに分散できるため、複数のエッジデバイスを使用して大規模な DNN モデルを推論できます。各レイヤー内の依存関係により、この並列推論中のノード間のデータ通信は、通信帯域が制限されている場合にボトルネックになる可能性があります。スパース通信を使用した分散型推論 (DISCO) の DNN モデルをトレーニングするためのフレームワークを提案します。ノード間で送信するデータのサブセットを選択する問題をモデル最適化問題に変換し、各レイヤーが複数のノードで推論されるときに計算と通信の両方を削減するモデルを導出します。画像分類、オブジェクト検出、セマンティックセグメンテーション、画像超解像など、さまざまな CV タスクに対する DISCO フレームワークの利点を示します。対応するモデルには、畳み込みやトランスフォーマーなどの重要な DNN ビルディングブロックが含まれています。たとえば、ResNet-50 モデルの各レイヤーは、2 つのノード間で分散的に推論できます。データ通信は 5 分の 1 で済み、単一ノードの計算全体とメモリ要件はほぼ半分であり、元の ResNet-50 モデルと同等の精度を達成できます。これにより、全体の推論速度が 4.7 倍向上します。

Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN models. In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes. In this way, the memory requirement can be distributed to many nodes, making it possible to use several edge devices to infer a large DNN model. Due to the dependency within each layer, data communications between nodes during this parallel inference can be a bottleneck when the communication bandwidth is limited. We propose a framework to train DNN models for Distributed Inference with Sparse Communications (DISCO). We convert the problem of selecting which subset of data to transmit between nodes into a model optimization problem, and derive models with both computation and communication reduction when each layer is inferred on multiple nodes. We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution. The corresponding models include important DNN building blocks such as convolutions and transformers. For example, each layer of a ResNet-50 model can be distributively inferred across two nodes with five times less data communications, almost half overall computations and half memory requirement for a single node, and achieve comparable accuracy to the original ResNet-50 model. This also results in 4.7 times overall inference speedup.

updated: Wed Feb 22 2023 07:20:34 GMT+0000 (UTC)

published: Wed Feb 22 2023 07:20:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト