Learning the semantic structure of objects from Web supervision

David Novotny; Diane Larlus; Andrea Vedaldi

Web監視からオブジェクトの意味構造を学ぶ

画像理解の最近の研究では、より多くの種類のオブジェクトを認識することに焦点が当てられていることがよくありますが、オブジェクトについてより深く理解することも同様に重要です。オブジェクトのパーツと属性の認識はこれまで広く研究されてきましたが、監視のために詳細なオブジェクトアノテーションを提供するコストが高いため、そのような概念の広いスペースを学習することは依然として困難です。この論文の主な貢献は、Web検索エンジンにクエリを実行して取得した画像から、オブジェクトの名前の付いた部分を自動的に学習するアルゴリズムです。重要な課題は、注釈の高レベルのノイズです。これに対処するために、オブジェクトの外観とジオメトリ、およびそれらのセマンティック部分が均一に表現される、新しい統合された埋め込みスペースを提案します。幾何学的関係は、意味的部分と非意味的部分の間のギャップを埋める、非意味的な中間レベルのアンカーの豊富なセットによってソフトな方法で誘導されます。また、結果として得られる埋め込みが、学習した概念とそれに対応する画像をナビゲートするための視覚的に直感的なメカニズムを提供することも示しています。

While recent research in image understanding has often focused on recognizing more types of objects, understanding more about the objects is just as important. Recognizing object parts and attributes has been extensively studied before, yet learning large space of such concepts remains elusive due to the high cost of providing detailed object annotations for supervision. The key contribution of this paper is an algorithm to learn the nameable parts of objects automatically, from images obtained by querying Web search engines. The key challenge is the high level of noise in the annotations; to address it, we propose a new unified embedding space where the appearance and geometry of objects and their semantic parts are represented uniformly. Geometric relationships are induced in a soft manner by a rich set of nonsemantic mid-level anchors, bridging the gap between semantic and non-semantic parts. We also show that the resulting embedding provides a visually-intuitive mechanism to navigate the learned concepts and their corresponding images.

updated: Thu Dec 02 2021 14:59:48 GMT+0000 (UTC)

published: Tue Jul 05 2016 11:56:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト