Sparse-Shot Learning for Extremely Many Localisations

Andreas Panteli; Jonas Teuwen; Hugo Horlings; Efstratios Gavves

非常に多くのローカリゼーションのためのスパースショット学習

オブジェクトのローカリゼーションは、通常、人や車などのオブジェクトを描写するなど、通常の画像のコンテキストで考慮されます。これらの画像では、通常、クラスごとの画像ごとに比較的少数のインスタンスがあり、通常は注釈を付けることができます。しかし、通常の画像の領域外では、私たちはしばしば異なる状況に直面します。計算病理学では、デジタル化された組織切片は非常に大きな画像であり、その寸法はすぐに250'000x250'000ピクセルを超え、腫瘍細胞やリンパ球などの関連するオブジェクトはすぐに数百万に達する可能性があります。それらすべてに注釈を付けることは事実上不可能であり、多くのうちのいくつかをまばらに注釈を付けることが唯一の可能性です。残念ながら、スパースアノテーションからの学習、またはスパースショット学習は、アノテーションが付けられていないものはネガティブとして扱われるため、標準の教師あり学習と衝突します。ただし、真のポジティブであるものにネガティブラベルを割り当てると、勾配の混乱と偏った学習につながります。この目的のために、排他的クロスエントロピーを提示します。これは、バイアスされた可能性のある項に対応する損失項を削除するために、2次損失導関数を調べることによってバイアスされた学習を遅くします。 9つのデータセットと2つの異なるローカリゼーションタスク（YOLLOによる検出とUnetによるセグメンテーション）での実験では、クロスエントロピーやフォーカルロスと比較して大幅な改善が得られ、注釈が10〜40個しかないモデルで可能な限り最高のパフォーマンスが得られることがよくあります。

Object localisation is typically considered in the context of regular images, for instance depicting objects like people or cars. In these images there is typically a relatively small number of instances per image per class, which usually is manageable to annotate. However, outside the realm of regular images we are often confronted with a different situation. In computational pathology digitised tissue sections are extremely large images, whose dimensions quickly exceed 250'000x250'000 pixels, where relevant objects, such as tumour cells or lymphocytes can quickly number in the millions. Annotating them all is practically impossible and annotating sparsely a few, out of many more, is the only possibility. Unfortunately, learning from sparse annotations, or sparse-shot learning, clashes with standard supervised learning because what is not annotated is treated as a negative. However, assigning negative labels to what are true positives leads to confusion in the gradients and biased learning. To this end, we present exclusive cross entropy, which slows down the biased learning by examining the second-order loss derivatives in order to drop the loss terms corresponding to likely biased terms. Experiments on nine datasets and two different localisation tasks, detection with YOLLO and segmentation with Unet, show that we obtain considerable improvements compared to cross entropy or focal loss, while often reaching the best possible performance for the model with only 10-40 of annotations.

updated: Wed Apr 21 2021 09:09:54 GMT+0000 (UTC)

published: Wed Apr 21 2021 09:09:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト