Towards Efficient Scene Understanding via Squeeze Reasoning

Xiangtai Li; Xia Li; Ansheng You; Li Zhang; Guangliang Cheng; Kuiyuan Yang; Yunhai Tong; Zhouchen Lin

スクイーズ推論による効率的なシーン理解に向けて

非ローカルブロックなどのグラフベースの畳み込みモデルは、畳み込みニューラルネットワーク（CNN）のコンテキストモデリング機能を強化するのに効果的であることが示されています。ただし、そのピクセル単位の計算オーバーヘッドは法外であり、高解像度の画像には不適切です。この論文では、コンテキストグラフ推論の効率を調査し、SqueezeReasoningと呼ばれる新しいフレームワークを提案します。空間マップ上で情報を伝播する代わりに、最初に入力フィーチャをチャネルごとのグローバルベクトルに絞り込み、計算コストを大幅に削減できる単一のベクトル内で推論を実行する方法を学びます。具体的には、各ノードが抽象的な意味概念を表すベクトルでノードグラフを作成します。同じセマンティックカテゴリ内の洗練された機能は一貫性があるため、ダウンストリームタスクに役立ちます。私たちのアプローチは、エンドツーエンドのトレーニング済みブロックとしてモジュール化でき、既存のネットワークに簡単に接続できることを示しています。その単純さと軽量にもかかわらず、提案された戦略は、さまざまなセマンティックセグメンテーションデータセットでかなりの結果を確立することを可能にし、オブジェクト検出、インスタンスセグメンテーション、パノラマセグメンテーションを含む他のさまざまなシーン理解タスクの強力なベースラインに関して大幅な改善を示します。コードはhttps://github.com/lxtGH/SFSegNetsで入手できます。

Graph-based convolutional model such as non-local block has shown to be effective for strengthening the context modeling ability in convolutional neural networks (CNNs). However, its pixel-wise computational overhead is prohibitive which renders it unsuitable for high resolution imagery. In this paper, we explore the efficiency of context graph reasoning and propose a novel framework called Squeeze Reasoning. Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced. Specifically, we build the node graph in the vector where each node represents an abstract semantic concept. The refined feature within the same semantic category results to be consistent, which is thus beneficial for downstream tasks. We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks. Despite its simplicity and being lightweight, the proposed strategy allows us to establish the considerable results on different semantic segmentation datasets and shows significant improvements with respect to strong baselines on various other scene understanding tasks including object detection, instance segmentation and panoptic segmentation. Code is available at https://github.com/lxtGH/SFSegNets.

updated: Tue Jul 20 2021 06:29:08 GMT+0000 (UTC)

published: Fri Nov 06 2020 12:17:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト