Collage Inference: Using Coded Redundancy for Low Variance Distributed   Image Classification

Krishna Giri Narra; Zhifeng Lin; Ganesh Ananthanarayanan; Salman Avestimehr; Murali Annavaram

コラージュ推論：低分散分散画像分類のためのコード化された冗長性の使用

Collage Inference: Using Coded Redundancy for Low Variance Distributed Image Classification

クラウドコンピューティングプラットフォームによるMLaaS（ML-as-a-Service）の提供は、ますます普及しています。事前トレーニング済みの機械学習モデルをクラウドでホストすることにより、需要の増加に応じて柔軟に拡張できます。ただし、低レイテンシを提供し、レイテンシのばらつきを減らすことが重要な要件です。多くの仮想インスタンス間でのリソース割り当ての不確実性のため、クラウド展開では分散を制御するのが困難です。低コストの冗長性を提供するために、新しい畳み込みニューラルネットワークモデルcollage-cnnを使用するコラージュ推論手法を提案します。 collage-cnnモデルは、複数の画像を組み合わせて形成されたコラージュ画像を取得し、わずかに低い精度ではありますが、1つのショットでマルチ画像分類を実行します。低コストの冗長バックアップとして機能する単一のcollage-cnn分類器で、従来の単一画像分類器モデルのコレクションを増強します。 Collage-cnnは、単一の画像分類リクエストで速度低下が発生した場合に、バックアップ分類結果を提供します。 collage-cnnモデルをクラウドにデプロイすると、99％の推論のテールレイテンシがレプリケーションベースのアプローチに比べて1.2倍から2倍短縮され、高い精度が得られることが実証されています。推論レイテンシの変動は1.8倍から15倍に削減できます。

MLaaS (ML-as-a-Service) offerings by cloud computing platforms are becoming increasingly popular. Hosting pre-trained machine learning models in the cloud enables elastic scalability as the demand grows. But providing low latency and reducing the latency variance is a key requirement. Variance is harder to control in a cloud deployment due to uncertainties in resource allocations across many virtual instances. We propose the collage inference technique which uses a novel convolutional neural network model, collage-cnn, to provide low-cost redundancy. A collage-cnn model takes a collage image formed by combining multiple images and performs multi-image classification in one shot, albeit at slightly lower accuracy. We augment a collection of traditional single image classifier models with a single collage-cnn classifier which acts as their low-cost redundant backup. Collage-cnn provides backup classification results if any single image classification requests experience slowdown. Deploying the collage-cnn models in the cloud, we demonstrate that the 99th percentile tail latency of inference can be reduced by 1.2x to 2x compared to replication based approaches while providing high accuracy. Variation in inference latency can be reduced by 1.8x to 15x.

updated: Tue Sep 10 2019 17:25:42 GMT+0000 (UTC)

published: Sat Apr 27 2019 22:56:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト