Supervised Compression for Resource-Constrained Edge Computing Systems

Yoshitomo Matsubara; Ruihan Yang; Marco Levorato; Stephan Mandt

リソースに制約のあるエッジコンピューティングシステムの監視された圧縮

スマートフォン、ドローン、医療センサーなどの低電力デバイスにディープラーニングアルゴリズムを導入することに大きな関心が寄せられています。ただし、実物大のディープニューラルネットワークは、エネルギーとストレージの点でリソースを大量に消費することがよくあります。その結果、機械学習操作の大部分は、データが圧縮されて送信されるエッジサーバーで実行されることがよくあります。ただし、データ（画像など）を圧縮すると、監視対象タスクに関係のない情報が送信されます。もう1つの一般的なアプローチは、中間機能を圧縮しながら、デバイスとサーバーの間でディープネットワークを分割することです。ただし、これまでのところ、このような分割コンピューティング戦略は、機能圧縮へのアプローチが非効率的であるため、前述の単純なデータ圧縮ベースラインをほとんど上回っていません。この論文は、知識蒸留と神経画像圧縮からのアイデアを採用して、中間の特徴表現をより効率的に圧縮します。私たちの監視された圧縮アプローチは、確率的ボトルネックがあり、エントロピーコーディングの前に学習可能な教師モデルと学生モデルを使用します（EntropicStudent）。 3つのビジョンタスクでさまざまなニューラルイメージと機能圧縮ベースラインへのアプローチを比較し、エンドツーエンドの遅延を小さく維持しながら、教師ありレート歪みパフォーマンスを向上させることがわかりました。さらに、学習した特徴表現を調整して、複数のダウンストリームタスクを処理できることを示します。

There has been much interest in deploying deep learning algorithms on low-powered devices, including smartphones, drones, and medical sensors. However, full-scale deep neural networks are often too resource-intensive in terms of energy and storage. As a result, the bulk part of the machine learning operation is therefore often carried out on an edge server, where the data is compressed and transmitted. However, compressing data (such as images) leads to transmitting information irrelevant to the supervised task. Another popular approach is to split the deep network between the device and the server while compressing intermediate features. To date, however, such split computing strategies have barely outperformed the aforementioned naive data compression baselines due to their inefficient approaches to feature compression. This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently. Our supervised compression approach uses a teacher model and a student model with a stochastic bottleneck and learnable prior for entropy coding (Entropic Student). We compare our approach to various neural image and feature compression baselines in three vision tasks and found that it achieves better supervised rate-distortion performance while maintaining smaller end-to-end latency. We furthermore show that the learned feature representations can be tuned to serve multiple downstream tasks.

updated: Wed Oct 20 2021 22:47:10 GMT+0000 (UTC)

published: Sat Aug 21 2021 11:10:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト