Learn to Predict Sets Using Feed-Forward Neural Networks

Hamid Rezatofighi; Tianyu Zhu; Roman Kaskman; Farbod T. Motlagh; Qinfeng Shi; Anton Milan; Daniel Cremers; Laura Leal-Taixé; Ian Reid

フィードフォワードニューラルネットワークを使用してセットを予測する方法を学ぶ

このホワイトペーパーでは、ディープフィードフォワードニューラルネットワークを使用した集合予測のタスクについて説明します。セットは、順列の下で不変である要素のコレクションであり、セットのサイズは事前に固定されていません。画像のタグ付けやオブジェクトの検出など、実際の問題の多くには、エンティティのセットとして自然に表現される出力があります。これは、ベクトル、行列、テンソルなどの構造化された出力を自然に処理する従来のディープニューラルネットワークに課題をもたらします。深い神経ネットワークを使用して、未知の順列とカーディナリティを持つセットを予測することを学習するための新しいアプローチを提示します。私たちの定式化では、a）集合のカーディナリー変数と順列変数を定義する2つの離散分布、およびb）固定カーディナリティを持つ集合要素の同時分布によって表される集合分布の尤度を定義します。検討中の問題に応じて、ディープニューラルネットワークを使用したセット予測のためのさまざまなトレーニングモデルを定義します。 1）PASCALVOCおよびMSCOCOデータセットで他の競合する方法を上回っているマルチラベル画像分類、2）定式化が一般的な状態を上回っているオブジェクト検出など、関連する視覚問題に対するセット定式化の有効性を示します。 -最先端の検出器、および3）複雑なCAPTCHAテスト。驚くべきことに、セットベースのネットワークがルールをコーディングせずに算術を模倣する機能を獲得したことがわかります。

This paper addresses the task of set prediction using deep feed-forward neural networks. A set is a collection of elements which is invariant under permutation and the size of a set is not fixed in advance. Many real-world problems, such as image tagging and object detection, have outputs that are naturally expressed as sets of entities. This creates a challenge for traditional deep neural networks which naturally deal with structured outputs such as vectors, matrices or tensors. We present a novel approach for learning to predict sets with unknown permutation and cardinality using deep neural networks. In our formulation we define a likelihood for a set distribution represented by a) two discrete distributions defining the set cardinally and permutation variables, and b) a joint distribution over set elements with a fixed cardinality. Depending on the problem under consideration, we define different training models for set prediction using deep neural networks. We demonstrate the validity of our set formulations on relevant vision problems such as: 1) multi-label image classification where we outperform the other competing methods on the PASCAL VOC and MS COCO datasets, 2) object detection, for which our formulation outperforms popular state-of-the-art detectors, and 3) a complex CAPTCHA test, where we observe that, surprisingly, our set-based network acquired the ability of mimicking arithmetics without any rules being coded.

updated: Mon Oct 25 2021 06:33:27 GMT+0000 (UTC)

published: Thu Jan 30 2020 01:52:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト