Large-Scale Attribute-Object Compositions

Filip Radenovic; Animesh Sinha; Albert Gordo; Tamara Berg; Dhruv Mahajan

大規模な属性-オブジェクトの構成

画像から属性オブジェクトの構成を予測する方法を学習する問題と、トレーニングデータから欠落している見えない構成への一般化について研究します。私たちの知る限り、これはこの問題の最初の大規模な研究であり、数十万の構成が含まれています。ノイズの多い弱い監視としてハッシュタグを使用して、Instagramからの画像でフレームワークをトレーニングします。ノイズの多い注釈や目に見えない構成を処理するために、データ収集とモデリングの設計を慎重に選択します。最後に、広範な評価は、分類子を構成することを学ぶことは、特に目に見えない属性とオブジェクトのペアの場合に、個々の属性とオブジェクトの予測の遅い融合よりも優れていることを示しています。

We study the problem of learning how to predict attribute-object compositions from images, and its generalization to unseen compositions missing from the training data. To the best of our knowledge, this is a first large-scale study of this problem, involving hundreds of thousands of compositions. We train our framework with images from Instagram using hashtags as noisy weak supervision. We make careful design choices for data collection and modeling, in order to handle noisy annotations and unseen compositions. Finally, extensive evaluations show that learning to compose classifiers outperforms late fusion of individual attribute and object predictions, especially in the case of unseen attribute-object pairs.

updated: Mon May 24 2021 16:05:41 GMT+0000 (UTC)

published: Mon May 24 2021 16:05:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト