OCNet: Object Context Network for Scene Parsing

Yuhui Yuan; Lang Huang; Jianyuan Guo; Chao Zhang; Xilin Chen; Jingdong Wang

OCNet：シーン解析のためのオブジェクトコンテキストネットワーク

この論文では、オブジェクト情報の役割を強化することに焦点を当てた、オブジェクトコンテキストという名前の新しいコンテキスト集約スキームを使用して、セマンティックセグメンテーションタスクに対処します。各ピクセルのカテゴリは、それが属するオブジェクトから継承されるという事実に動機付けられて、各ピクセルのオブジェクトコンテキストを、画像内の特定のピクセルと同じカテゴリに属するピクセルのセットとして定義します。バイナリ関係行列を使用して、すべてのピクセル間の関係を表します。値1は、選択した2つのピクセルが同じカテゴリに属していることを示し、それ以外の場合は0を示します。二項関係行列の代理として機能するために、密な関係行列を使用することを提案します。密な関係行列は、関係スコアが他のピクセルよりもオブジェクトピクセルで大きくなる傾向があるため、オブジェクト情報の寄与を強調することができます。密な関係行列の推定には、入力サイズに対する2次計算のオーバーヘッドとメモリ消費が必要であることを考慮して、2つの疎な関係行列の組み合わせを介して、すべてのピクセルの任意の2つの間の密な関係をモデル化する効率的なインターレーススパース自己注意スキームを提案します。より豊富なコンテキスト情報をキャプチャするために、インターレーススパース自己注意スキームを、ピラミッドプーリング〜zhao2017pyramidおよびアトラス空間ピラミッドプーリング〜chen2018deeplabを含む従来のマルチスケールコンテキストスキームとさらに組み合わせます。 Cityscapes、ADE20K、LIP、PASCAL-Context、COCO-Stuffを含む5つの挑戦的なベンチマークで、競争力のあるパフォーマンスでアプローチの利点を経験的に示します。

In this paper, we address the semantic segmentation task with a new context aggregation scheme named object context, which focuses on enhancing the role of object information. Motivated by the fact that the category of each pixel is inherited from the object it belongs to, we define the object context for each pixel as the set of pixels that belong to the same category as the given pixel in the image. We use a binary relation matrix to represent the relationship between all pixels, where the value one indicates the two selected pixels belong to the same category and zero otherwise. We propose to use a dense relation matrix to serve as a surrogate for the binary relation matrix. The dense relation matrix is capable to emphasize the contribution of object information as the relation scores tend to be larger on the object pixels than the other pixels. Considering that the dense relation matrix estimation requires quadratic computation overhead and memory consumption w.r.t. the input size, we propose an efficient interlaced sparse self-attention scheme to model the dense relations between any two of all pixels via the combination of two sparse relation matrices. To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~zhao2017pyramid and atrous spatial pyramid pooling~chen2018deeplab. We empirically show the advantages of our approach with competitive performances on five challenging benchmarks including: Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff

updated: Mon Mar 15 2021 03:27:54 GMT+0000 (UTC)

published: Tue Sep 04 2018 12:22:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト