Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations

Thomas Verelst; Paul K. Rubenstein; Marcin Eichner; Tinne Tuytelaars; Maxim Berman

シングルラベル注釈からマルチラベル分類器をトレーニングするための空間的一貫性の損失

自然画像には通常複数のオブジェクトが含まれているため、マルチラベル画像分類は、単一ラベル分類よりも「実際に」適用できます。ただし、関心のあるすべてのオブジェクトで画像に徹底的に注釈を付けるには、コストと時間がかかります。シングルラベルアノテーションのみからマルチラベル分類器をトレーニングすることを目指しています。一貫性の損失を追加し、ネットワークの予測が連続するトレーニングエポックにわたって一貫していることを保証することは、弱く監視された設定でマルチラベル分類器をトレーニングするためのシンプルで効果的な方法であることを示します。連続するトレーニングエポックにわたって生成された空間特徴マップの一貫性を確保し、各トレーニング画像のクラスごとの移動平均ヒートマップを維持することにより、このアプローチをさらに空間的に拡張します。この空間的一貫性の喪失により、分類器のマルチラベルmAPがさらに改善されることを示します。さらに、この方法は、データ拡張によって入力画像から単一のグラウンドトゥルースオブジェクトのほとんどが切り取られた場合でも、正しい監視信号を回復することにより、「切り抜き」データ拡張の欠点を克服することを示します。 MS-COCOおよびPascalVOCで、バイナリクロスエントロピーベースラインおよび競合するメソッドよりも一貫性と空間的一貫性の損失が増加することを示します。また、ReaLマルチラベル検証セットを使用して、ImageNet-1Kで改善されたマルチラベル分類mAPを示します。

As natural images usually contain multiple objects, multi-label image classification is more applicable "in the wild" than single-label classification. However, exhaustively annotating images with every object of interest is costly and time-consuming. We aim to train multi-label classifiers from single-label annotations only. We show that adding a consistency loss, ensuring that the predictions of the network are consistent over consecutive training epochs, is a simple yet effective method to train multi-label classifiers in a weakly supervised setting. We further extend this approach spatially, by ensuring consistency of the spatial feature maps produced over consecutive training epochs, maintaining per-class running-average heatmaps for each training image. We show that this spatial consistency loss further improves the multi-label mAP of the classifiers. In addition, we show that this method overcomes shortcomings of the "crop" data-augmentation by recovering correct supervision signal even when most of the single ground truth object is cropped out of the input image by the data augmentation. We demonstrate gains of the consistency and spatial consistency losses over the binary cross-entropy baseline, and over competing methods, on MS-COCO and Pascal VOC. We also demonstrate improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.

updated: Fri Mar 11 2022 17:54:20 GMT+0000 (UTC)

published: Fri Mar 11 2022 17:54:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト