Weakly Supervised Localization Using Background Images

Ziyi Kou; Wentian Zhao; Guofeng Cui; Shaojie Wang

背景画像を使用した弱い教師付きローカリゼーション

弱監視オブジェクトローカリゼーション（WSOL）メソッドは通常、ターゲットラベルのクラスアクティベーションマップ（CAM）を取得するために、完全な畳み込みネットワークに依存しています。ただし、これらのネットワークは常にタスクを実行するための最も識別可能な部分を強調表示し、配置された領域はターゲットオブジェクト全体よりもはるかに小さくなります。この作業では、分類モデルから生成されたCAMを拡大するための新規のエンドツーエンドモデルを提案します。これにより、ターゲットオブジェクトをより正確にローカライズできます。詳細には、従来の分類ネットワークに追加モジュールを追加して、特定のカテゴリに分類せずに画像から前景オブジェクトの提案を抽出します。次に、これらの正規化された領域を、次の分類タスクの無制限のピクセルレベルのマスク監視として設定します。インターネットからBackground ImageSetとして定義された一連の画像を収集します。それらの数は、ターゲットのデータセットよりもはるかに少ないですが、驚くべきことに、異なる画像から前景領域を抽出する方法を十分にサポートしています。抽出された領域は、分類タスクとは無関係です。部。したがって、これらの領域は、分類モデルから生成された応答マップを監視するマスクとして機能し、より大きくより正確になります。この方法は、トップ1およびトップ5のローカリゼーションエラーに関して、CUB-200-2011で最先端の結果を達成し、他のアプローチと比較してILSVRC2016で競争力のある結果をもたらします。

Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels. How-ever, these networks always highlight the most discriminativeparts to perform the task, the located areas are much smallerthan entire targeted objects. In this work, we propose a novelend-to-end model to enlarge CAMs generated from classifi-cation models, which can localize targeted objects more pre-cisely. In detail, we add an additional module in traditionalclassification networks to extract foreground object propos-als from images without classifying them into specific cate-gories. Then we set these normalized regions as unrestrictedpixel-level mask supervision for the following classificationtask. We collect a set of images defined as Background ImageSet from the Internet. The number of them is much smallerthan the targeted dataset but surprisingly well supports themethod to extract foreground regions from different pictures.The region extracted is independent from classification task,where the extracted region in each image covers almost en-tire object rather than just a significant part. Therefore, theseregions can serve as masks to supervise the response mapgenerated from classification models to become larger andmore precise. The method achieves state-of-the-art results onCUB-200-2011 in terms of Top-1 and Top-5 localization er-ror while has a competitive result on ILSVRC2016 comparedwith other approaches.

updated: Wed Sep 11 2019 00:33:11 GMT+0000 (UTC)

published: Mon Sep 09 2019 03:34:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト