Confidence Adaptive Anytime Pixel-Level Recognition

Zhuang Liu; Trevor Darrell; Evan Shelhamer

信頼性適応型いつでもピクセルレベルの認識

いつでも推論には、いつでも停止する可能性のある予測の進行を行うためのモデルが必要です。いつでも視覚認識に関する以前の研究は、主に画像分類に焦点を合わせてきました。いつでもピクセルレベルで認識できる、最初の統合されたエンドツーエンドのモデルアプローチを提案します。「出口」のカスケードがモデルに接続され、複数の予測を行い、さらに計算を指示します。各出口の特徴の深さと空間分解能を考慮して、出口を再設計します。全体の計算を減らし、以前の予測を最大限に活用するために、初期の予測がすでに十分に自信を持っている領域でのさらなる計算を回避するために、新しい空間適応アプローチを開発します。再設計された出口アーキテクチャと空間適応性を備えた完全なモデルにより、いつでも推論が可能になり、同じレベルの最終精度が達成され、計算全体が大幅に削減されます。セマンティックセグメンテーションと人間の姿勢推定に関するアプローチを評価します。 CityscapesセマンティックセグメンテーションとMPII人間ポーズ推定では、私たちのアプローチにより、精度を犠牲にすることなく、ベースモデルの合計FLOPを44.4％と59.1％削減しながら、いつでも推論が可能になります。新しいいつでもベースラインとして、本質的に反復的な最近のクラスのモデルである深い平衡ネットワークのいつでも機能を測定し、アーキテクチャの精度計算曲線が厳密にそれを支配していることを示します。

Anytime inference requires a model to make a progression of predictions which might be halted at any time. Prior research on anytime visual recognition has mostly focused on image classification. We propose the first unified and end-to-end model approach for anytime pixel-level recognition. A cascade of "exits" is attached to the model to make multiple predictions and direct further computation. We redesign the exits to account for the depth and spatial resolution of the features for each exit. To reduce total computation, and make full use of prior predictions, we develop a novel spatially adaptive approach to avoid further computation on regions where early predictions are already sufficiently confident. Our full model with redesigned exit architecture and spatial adaptivity enables anytime inference, achieves the same level of final accuracy, and even significantly reduces total computation. We evaluate our approach on semantic segmentation and human pose estimation. On Cityscapes semantic segmentation and MPII human pose estimation, our approach enables anytime inference while also reducing the total FLOPs of its base models by 44.4% and 59.1% without sacrificing accuracy. As a new anytime baseline, we measure the anytime capability of deep equilibrium networks, a recent class of model that is intrinsically iterative, and we show that the accuracy-computation curve of our architecture strictly dominates it.

updated: Thu Apr 01 2021 20:01:57 GMT+0000 (UTC)

published: Thu Apr 01 2021 20:01:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト