Learning Visual Models using a Knowledge Graph as a Trainer

Sebastian Monka; Lavdim Halilaj; Stefan Schmid; Achim Rettinger

知識グラフをトレーナーとして使用して視覚モデルを学習する

ニューラルネットワーク（NN）に基づく従来のコンピュータービジョンアプローチは、通常、大量の画像データでトレーニングされます。予測と特定のクラスラベルの間のクロスエントロピー損失を最小限に抑えることにより、NNとその視覚的埋め込みスペースは、特定のタスクを実行するように学習されます。ただし、トレーニングドメインの画像データ分布にのみ依存しているため、これらのモデルは、ソースドメインとは異なるターゲットドメインに適用すると失敗する傾向があります。より堅牢なNNからドメインへのシフトを学習するために、知識グラフニューラルネットワーク（KG-NN）を提案します。これは、画像データ不変の補助知識を使用してトレーニングを監視する神経記号アプローチです。補助知識は、最初にそれぞれの概念とそれらの関係を含む知識グラフにエンコードされ、次に埋め込み方法を介して密なベクトル表現に変換されます。対照的な損失関数を使用して、KG-NNは、画像データ不変の知識グラフ埋め込み空間に従って、視覚的な埋め込み空間、したがってその重みを適応させることを学習します。 mini-ImageNetデータセットとその派生物、およびドイツと中国の道路標識認識データセットを使用して、分類のための視覚的伝達学習タスクでKG-NNを評価します。結果は、トレーナーとして知識グラフでトレーニングされた視覚モデルが、すべての実験で、特にドメインギャップが増加した場合に、クロスエントロピーでトレーニングされたモデルよりも優れていることを示しています。これらのKG-NNは、パフォーマンスの向上とドメインシフトに対する堅牢性の強化に加えて、壊滅的な忘却に大きく悩まされることなく、複数のデータセットとクラスに適応します。

Traditional computer vision approaches, based on neural networks (NN), are typically trained on a large amount of image data. By minimizing the cross-entropy loss between a prediction and a given class label, the NN and its visual embedding space are learned to fulfill a given task. However, due to the sole dependence on the image data distribution of the training domain, these models tend to fail when applied to a target domain that differs from their source domain. To learn a more robust NN to domain shifts, we propose the knowledge graph neural network (KG-NN), a neuro-symbolic approach that supervises the training using image-data-invariant auxiliary knowledge. The auxiliary knowledge is first encoded in a knowledge graph with respective concepts and their relationships, which is then transformed into a dense vector representation via an embedding method. Using a contrastive loss function, KG-NN learns to adapt its visual embedding space and thus its weights according to the image-data invariant knowledge graph embedding space. We evaluate KG-NN on visual transfer learning tasks for classification using the mini-ImageNet dataset and its derivatives, as well as road sign recognition datasets from Germany and China. The results show that a visual model trained with a knowledge graph as a trainer outperforms a model trained with cross-entropy in all experiments, in particular when the domain gap increases. Besides better performance and stronger robustness to domain shifts, these KG-NN adapts to multiple datasets and classes without suffering heavily from catastrophic forgetting.

updated: Mon Jul 12 2021 13:21:17 GMT+0000 (UTC)

published: Wed Feb 17 2021 13:24:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト