Variational Saccading: Efficient Inference for Large Resolution Images

Jason Ramapuram; Maurits Diephuis; Frantzeska Lavda; Russ Webb; Alexandros Kalousis

変分サッケード：大解像度画像の効率的な推論

ディープニューラルネットワークによる画像分類は、通常、Resnetモデルの224 x 244などの小さな次元の画像に制限されます[24]。この制限には、最新のスマートフォンのカメラやスマートデバイスで撮影される4000 x 3000の次元の画像は含まれません。この作業では、このような大きな次元の空間での運用における予想外の推論およびメモリコストを軽減することを目指しています。高解像度の元の入力分布からサンプリングするために、より小さなプロキシ分布を使用して、高次元空間の関心領域に対応する座標を学習することを提案します。条件付き分類尤度を最大化する方法で、プロキシ分布の事後と元の画像の座標空間の関係をキャプチャする新しい原理的な変分下限を導入します。 1つの合成ベンチマークと1つの実世界の大解像度DSLRカメラ画像データセットで、元の入力分布全体を使用するモデルよりも約10倍高速な推論と低いメモリ消費で本方法が同等の結果を生成することを実証します。最後に、Starcraft II [56]のミニマップを使用して、より複雑な設定で実験し、複雑な3Dレンダリングシーンのキャラクター数を推測します。このような複雑なシーンでも、当社のモデルは強力なローカリゼーションを提供します。これは、従来の分類モデルにはない機能です。

Image classification with deep neural networks is typically restricted to images of small dimensionality such as 224 x 244 in Resnet models [24]. This limitation excludes the 4000 x 3000 dimensional images that are taken by modern smartphone cameras and smart devices. In this work, we aim to mitigate the prohibitive inferential and memory costs of operating in such large dimensional spaces. To sample from the high-resolution original input distribution, we propose using a smaller proxy distribution to learn the co-ordinates that correspond to regions of interest in the high-dimensional space. We introduce a new principled variational lower bound that captures the relationship of the proxy distribution's posterior and the original image's co-ordinate space in a way that maximizes the conditional classification likelihood. We empirically demonstrate on one synthetic benchmark and one real world large resolution DSLR camera image dataset that our method produces comparable results with ~10x faster inference and lower memory consumption than a model that utilizes the entire original input distribution. Finally, we experiment with a more complex setting using mini-maps from Starcraft II [56] to infer the number of characters in a complex 3d-rendered scene. Even in such complicated scenes our model provides strong localization: a feature missing from traditional classification models.

updated: Fri Sep 06 2019 11:41:23 GMT+0000 (UTC)

published: Sat Dec 08 2018 16:53:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト