Towards Recognizing New Semantic Concepts in New Visual Domains

Massimiliano Mancini

新しい視覚領域における新しい意味概念の認識に向けて

深層学習モデルは、トレーニングのために大規模な注釈付きデータセットに大きく依存しています。残念ながら、データセットは実世界の無限の変動性をキャプチャできないため、ニューラルネットワークはトレーニングセットに含まれる制限された視覚的および意味的情報によって本質的に制限されます。この論文では、これまでに見られなかった視覚領域で動作し、新しい意味概念を認識することができる深いアーキテクチャを設計することが重要であると主張します。論文の最初の部分では、ラベル付きのソースドメインからラベル付きのデータが利用できないドメイン（ターゲット）に知識を転送することにより、ディープモデルを新しいビジュアルドメインに一般化できるようにするさまざまなソリューションについて説明します。ソースとターゲットが複数の潜在ドメインの混合である場合のドメイン適応から、ドメインの一般化、連続ドメイン適応、予測ドメイン適応まで、バッチ正規化（BN）のバリアントをさまざまなシナリオに適用する方法を示します。ターゲットドメインは、メタデータの形式でのみ使用できます。論文の第2部では、元のトレーニングセットにアクセスせずに、事前にトレーニングされたディープモデルの知識を新しいセマンティックコンセプトに拡張する方法を示します。変換されたタスク固有のバイナリマスク、オープンワールド認識、エンドツーエンドのトレーニングと強制クラスタリング、およびセマンティックセグメンテーションでの増分クラス学習を使用して、順次マルチタスク学習のシナリオに対処します。ここでは、問題を強調して対処します。バックグラウンドクラスのセマンティックシフトの。最後の部分では、より困難な問題に取り組みます。複数のドメインとセマンティックカテゴリ（属性を含む）の画像が与えられた場合、見えないドメインの見えない概念の画像を認識するモデルを構築するにはどうすればよいですか？また、入力と機能のドメインとセマンティックの混合に基づくアプローチを提案します。これは、この問題を解決するための最初の有望なステップです。

Deep learning models heavily rely on large scale annotated datasets for training. Unfortunately, datasets cannot capture the infinite variability of the real world, thus neural networks are inherently limited by the restricted visual and semantic information contained in their training set. In this thesis, we argue that it is crucial to design deep architectures that can operate in previously unseen visual domains and recognize novel semantic concepts. In the first part of the thesis, we describe different solutions to enable deep models to generalize to new visual domains, by transferring knowledge from a labeled source domain(s) to a domain (target) where no labeled data are available. We will show how variants of batch-normalization (BN) can be applied to different scenarios, from domain adaptation when source and target are mixtures of multiple latent domains, to domain generalization, continuous domain adaptation, and predictive domain adaptation, where information about the target domain is available only in the form of metadata. In the second part of the thesis, we show how to extend the knowledge of a pretrained deep model to new semantic concepts, without access to the original training set. We address the scenarios of sequential multi-task learning, using transformed task-specific binary masks, open-world recognition, with end-to-end training and enforced clustering, and incremental class learning in semantic segmentation, where we highlight and address the problem of the semantic shift of the background class. In the final part, we tackle a more challenging problem: given images of multiple domains and semantic categories (with their attributes), how to build a model that recognizes images of unseen concepts in unseen domains? We also propose an approach based on domain and semantic mixing of inputs and features, which is a first, promising step towards solving this problem.

updated: Wed Dec 16 2020 16:23:40 GMT+0000 (UTC)

published: Wed Dec 16 2020 16:23:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト