Data-Assemble: Leveraging Multiple Datasets with Partial Labels

Mintong Kang; Yongyi Lu; Alan L. Yuille; Zongwei Zhou

データアセンブル：部分的なラベルを持つ複数のデータセットの活用

ディープラーニングの成功は、広範なラベルを持つ大きくて多様なデータセットに大きく依存していますが、部分的なラベルに関連付けられたいくつかの小さなデータセットにしかアクセスできないことがよくあります。このホワイトペーパーでは、公開データセットのアセンブリから部分的にラベル付けされたデータの可能性を最大限に引き出すことを目的とした、新しいイニシアチブ「データアセンブリ」を開始します。具体的には、さまざまな視覚的タスクをエンコードするための新しい動的アダプターを導入します。これは、比類のない、異種の、または競合するラベリングプロトコルの課題に対処します。また、疑似ラベル付けと整合性の制約を使用して、ラベルが欠落しているデータを利用し、データセット間のドメインギャップを緩和します。 3つの自然画像と6つの医用画像タスクの厳密な評価から、「ネガティブな例」から学ぶことで、関心のあるクラスの分類とセグメンテーションの両方が容易になることがわかります。これは、「ポジティブな例」を収集するのは難しいが、「ネガティブな例」は比較的簡単に組み立てることができる、希少疾患や新たなパンデミックのコンピューター支援診断に新たな光を当てます。 ChestXrayベンチマークで先行技術を超えることは別として、私たちのモデルはマイノリティクラスの疾患を特定するのに特に強力であり、平均して3ポイント以上の改善をもたらします。注目すべきことに、既存の部分ラベルを使用する場合、モデルのパフォーマンスは完全ラベルを使用する場合と同等であり、注釈コストの40％を追加する必要がありません。コードはhttps://github.com/MrGiovanni/DataAssembleで利用できるようになります。

The success of deep learning relies heavily on large and diverse datasets with extensive labels, but we often only have access to several small datasets associated with partial labels. In this paper, we start a new initiative, "Data-Assemble", that aims to unleash the full potential of partially labeled data from an assembly of public datasets. Specifically, we introduce a new dynamic adapter to encode different visual tasks, which addresses the challenges of incomparable, heterogeneous, or even conflicting labeling protocols. We also employ pseudo-labeling and consistency constraints to harness data with missing labels and to mitigate the domain gap across datasets. From rigorous evaluations on three natural imaging and six medical imaging tasks, we discover that learning from "negative examples" facilitates both classification and segmentation of classes of interest. This sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble. Apart from exceeding prior arts in the ChestXray benchmark, our model is particularly strong in identifying diseases of minority classes, yielding over 3-point improvement on average. Remarkably, when using existing partial labels, our model performance is on-par with that using full labels, eliminating the need for an additional 40% of annotation costs. Code will be made available at https://github.com/MrGiovanni/DataAssemble.

updated: Thu Dec 09 2021 00:36:30 GMT+0000 (UTC)

published: Sat Sep 25 2021 02:48:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト