Supervised Knowledge May Hurt Novel Class Discovery Performance

Ziyun Li; Jona Otholt; Ben Dai; Di Hu; Christoph Meinel; Haojin Yang

教師付きナレッジは新規クラスの発見パフォーマンスを損なう可能性がある

Novel class Discovery (NCD) は、素ではあるが関連するクラスで構成されるラベル付きセットの事前知識を活用することで、ラベルなしのデータセット内の新しいカテゴリを推論することを目的としています。既存の文献のほとんどが方法論レベルでラベル付きセットからの教師あり知識を利用することに主に焦点を当てていることを考慮して、この論文では「教師あり知識は意味論的関連性のさまざまなレベルで常に役立つのか?」という質問を検討します。続行するには、まず、ラベル付きデータセットとラベルなしデータセット間の意味論的な類似性を測定するための新しいメトリクス、いわゆる転送フローを確立します。提案されたメトリクスの妥当性を示すために、階層クラス構造を利用して、ImageNet 上のラベル付き/ラベルなしデータセット間のさまざまな程度の意味的類似性を備えた大規模なベンチマークを構築します。提案されたベンチマークに基づく結果は、提案された転送フローが階層クラス構造と一致していることを示しています。そして、NCD のパフォーマンスが意味論的な類似性 (提案されたメトリックによって測定) と一致していること。次に、提案された転送フローを使用して、さまざまなレベルの意味的類似性でさまざまな実証実験を実行し、教師あり知識が NCD のパフォーマンスに悪影響を与える可能性があることを明らかにしました。具体的には、類似性の低いラベル付きセットからの教師付き情報を使用すると、純粋な自己教師付き知識を使用する場合と比較して、次善の結果が得られる可能性があります。これらの結果は、通常、教師付き知識が有益であると想定している既存の NCD 文献が不適切であることを明らかにしています。最後に、NCD で教師付き知識を使用する必要があるかどうかを決定するための実用的なリファレンスとして、転送フローの疑似バージョンを開発します。その有効性は、擬似転送フロー (教師あり知識の有無にかかわらず) がさまざまなデータセットに基づく対応する精度と一致していることを示す実証研究によって裏付けられています。コードは https://github.com/JLO/SK-Hurt-NCD でリリースされています

Novel class discovery (NCD) aims to infer novel categories in an unlabeled dataset by leveraging prior knowledge of a labeled set comprising disjoint but related classes. Given that most existing literature focuses primarily on utilizing supervised knowledge from a labeled set at the methodology level, this paper considers the question: Is supervised knowledge always helpful at different levels of semantic relevance? To proceed, we first establish a novel metric, so-called transfer flow, to measure the semantic similarity between labeled/unlabeled datasets. To show the validity of the proposed metric, we build up a large-scale benchmark with various degrees of semantic similarities between labeled/unlabeled datasets on ImageNet by leveraging its hierarchical class structure. The results based on the proposed benchmark show that the proposed transfer flow is in line with the hierarchical class structure; and that NCD performance is consistent with the semantic similarities (measured by the proposed metric). Next, by using the proposed transfer flow, we conduct various empirical experiments with different levels of semantic similarity, yielding that supervised knowledge may hurt NCD performance. Specifically, using supervised information from a low-similarity labeled set may lead to a suboptimal result as compared to using pure self-supervised knowledge. These results reveal the inadequacy of the existing NCD literature which usually assumes that supervised knowledge is beneficial. Finally, we develop a pseudo-version of the transfer flow as a practical reference to decide if supervised knowledge should be used in NCD. Its effectiveness is supported by our empirical studies, which show that the pseudo transfer flow (with or without supervised knowledge) is consistent with the corresponding accuracy based on various datasets. Code is released at https://github.com/J-L-O/SK-Hurt-NCD

updated: Tue Jun 06 2023 13:04:05 GMT+0000 (UTC)

published: Tue Jun 06 2023 13:04:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト