Dynamic Network Quantization for Efficient Video Inference

Ximeng Sun; Rameswar Panda; Chun-Fu Chen; Aude Oliva; Rogerio Feris; Kate Saenko

効率的なビデオ推論のための動的ネットワーク量子化

深い畳み込みネットワークは最近、ビデオ認識で大きな成功を収めましたが、堅牢な認識を実現するには大量の計算リソースが必要なため、実際の実現は依然として課題です。効率を高めるための量子化の有効性に動機付けられて、本論文では、効率的なビデオ認識のために入力を条件とする各フレームに最適な精度を選択する動的ネットワーク量子化フレームワークを提案します。具体的には、ビデオクリップを指定して、認識ネットワークと並行して非常に軽量なネットワークをトレーニングし、ビデオの認識でフレームごとに使用する数値精度を示す動的ポリシーを作成します。ビデオ認識に必要な競争力のあるパフォーマンスとリソース効率の両方を達成するために、損失のある標準的なバックプロパゲーションを使用して両方のネットワークを効果的にトレーニングします。 4つの挑戦的な多様なベンチマークデータセットでの広範な実験は、提案されたアプローチが既存の最先端の方法を上回りながら、計算とメモリ使用量の大幅な節約を提供することを示しています。

Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated by the effectiveness of quantization for boosting efficiency, in this paper, we propose a dynamic network quantization framework, that selects optimal precision for each frame conditioned on the input for efficient video recognition. Specifically, given a video clip, we train a very lightweight network in parallel with the recognition network, to produce a dynamic policy indicating which numerical precision to be used per frame in recognizing videos. We train both networks effectively using standard backpropagation with a loss to achieve both competitive performance and resource efficiency required for video recognition. Extensive experiments on four challenging diverse benchmark datasets demonstrate that our proposed approach provides significant savings in computation and memory usage while outperforming the existing state-of-the-art methods.

updated: Mon Aug 23 2021 20:23:57 GMT+0000 (UTC)

published: Mon Aug 23 2021 20:23:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト