Salient Object Ranking with Position-Preserved Attention

Hao Fang; Daoxin Zhang; Yi Zhang; Minghao Chen; Jiawei Li; Yao Hu; Deng Cai; Xiaofei He

位置保存注意による顕著なオブジェクトのランキング

インスタンスセグメンテーションは、オブジェクトが画像内のどこにあるかを検出できますが、オブジェクト間の関係を理解するのは困難です。典型的な関係である相対顕著性に注意を払います。密接に関連するタスクである顕著なオブジェクトの検出は、複数のオブジェクトを区別するのが難しいが、視覚的に顕著な領域を強調するバイナリマップを予測します。 2 つのタスクを後処理で直接結合すると、パフォーマンスが低下します。現時点では、相対的な顕著性に関する研究が不足しており、コンテンツを意識した画像のトリミング、ビデオの要約、画像のラベル付けなどの実用的なアプリケーションが制限されています。この論文では、顕著なオブジェクトのランキング (SOR) タスクを研究します。このタスクは、検出された各オブジェクトの視覚的顕著性に従ってランキング順序を割り当てることができます。 SOR タスクの最初のエンドツーエンドのフレームワークを提案し、マルチタスク学習方式でそれを解決します。フレームワークは、インスタンスのセグメンテーションと顕著なオブジェクトのランキングを同時に処理します。このフレームワークでは、SOR ブランチは独立しており、さまざまな検出方法と連携する柔軟性があるため、プラグインとして簡単に使用できます。また、SOR ブランチ用に調整された位置保持アテンション (PPA) モジュールも導入します。これは、位置埋め込み段階と機能相互作用段階で構成されます。顕著性の比較における位置の重要性を考慮して、ROI プーリング操作でオブジェクトの絶対座標を保存し、最初の段階で位置情報を意味的特徴と融合します。機能の相互作用段階では、アテンションメカニズムを適用して、提案のコンテキスト化された表現を取得し、相対的なランキング順を予測します。 ASR データセットで広範な実験が行われました。余計なものがないため、提案された方法は、以前の最先端の方法よりも大幅に優れています。コードは一般に公開されます。

Instance segmentation can detect where the objects are in an image, but hard to understand the relationship between them. We pay attention to a typical relationship, relative saliency. A closely related task, salient object detection, predicts a binary map highlighting a visually salient region while hard to distinguish multiple objects. Directly combining two tasks by post-processing also leads to poor performance. There is a lack of research on relative saliency at present, limiting the practical applications such as content-aware image cropping, video summary, and image labeling. In this paper, we study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency. We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion. The framework handles instance segmentation and salient object ranking simultaneously. In this framework, the SOR branch is independent and flexible to cooperate with different detection methods, so that easy to use as a plugin. We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch. It consists of the position embedding stage and feature interaction stage. Considering the importance of position in saliency comparison, we preserve absolute coordinates of objects in ROI pooling operation and then fuse positional information with semantic features in the first stage. In the feature interaction stage, we apply the attention mechanism to obtain proposals' contextualized representations to predict their relative ranking orders. Extensive experiments have been conducted on the ASR dataset. Without bells and whistles, our proposed method outperforms the former state-of-the-art method significantly. The code will be released publicly available.

updated: Thu Jun 10 2021 02:23:59 GMT+0000 (UTC)

published: Wed Jun 09 2021 13:00:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト