CARAFE++: Unified Content-Aware ReAssembly of FEatures

Jiaqi Wang; Kai Chen; Rui Xu; Ziwei Liu; Chen Change Loy; Dahua Lin

CARAFE ++：機能の統合されたコンテンツ対応の再アセンブリ

機能の再構築、つまり機能のダウンサンプリングとアップサンプリングは、残余ネットワークや機能ピラミッドなど、多くの最新の畳み込みネットワークアーキテクチャにおける重要な操作です。その設計は、オブジェクト検出やセマンティック/インスタンスセグメンテーションなどの高密度の予測タスクにとって重要です。この作業では、この目標を達成するための、ユニバーサルで軽量かつ非常に効果的なオペレーターである、統合されたContent-Aware ReAssembly of FEatures（CARAFE ++）を提案します。 CARAFE ++には、いくつかの魅力的な特性があります。（1）サブピクセル近傍のみを利用するプーリングや補間などの従来の方法とは異なり、CARAFE ++は大きな受容野内のコンテキスト情報を集約します。（2）すべてのサンプル（畳み込みやデコンボリューションなど）に固定カーネルを使用する代わりに、CARAFE ++はオンザフライで適応カーネルを生成して、インスタンス固有のコンテンツ認識処理を可能にします。（3）CARAFE ++は、計算のオーバーヘッドがほとんどなく、最新のネットワークアーキテクチャに簡単に統合できます。オブジェクト検出、インスタンス/セマンティックセグメンテーション、画像修復の標準ベンチマークについて包括的な評価を行います。 CARAFE ++は、すべてのタスク（それぞれ、2.5％APbox、2.1％APmask、1.94％mIoU、1.35 dB）で一貫した実質的なゲインを示し、計算オーバーヘッドはごくわずかです。これは、最新のディープネットワークの強力なビルディングブロックとして機能する大きな可能性を示しています。

Feature reassembly, i.e. feature downsampling and upsampling, is a key operation in a number of modern convolutional network architectures, e.g., residual networks and feature pyramids. Its design is critical for dense prediction tasks such as object detection and semantic/instance segmentation. In this work, we propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal. CARAFE++ has several appealing properties: (1) Unlike conventional methods such as pooling and interpolation that only exploit sub-pixel neighborhood, CARAFE++ aggregates contextual information within a large receptive field. (2) Instead of using a fixed kernel for all samples (e.g. convolution and deconvolution), CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling. (3) CARAFE++ introduces little computational overhead and can be readily integrated into modern network architectures. We conduct comprehensive evaluations on standard benchmarks in object detection, instance/semantic segmentation and image inpainting. CARAFE++ shows consistent and substantial gains across all the tasks (2.5% APbox, 2.1% APmask, 1.94% mIoU, 1.35 dB respectively) with negligible computational overhead. It shows great potential to serve as a strong building block for modern deep networks.

updated: Mon Dec 07 2020 07:34:57 GMT+0000 (UTC)

published: Mon Dec 07 2020 07:34:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト