Discriminative Triad Matching and Reconstruction for Weakly Referring Expression Grounding

Mingjie Sun; Jimin Xiao; Eng Gee Lim; Si Liu; John Y. Goulermas

弱い参照表現のグラウンディングのための識別トライアドマッチングと再構成

この論文では、トレーニング段階では画像領域とクエリの間のマッピングが利用できないクエリ文に従って画像内の参照オブジェクトの位置を特定するために、弱い教師あり参照表現のグラウンディングタスクに取り組んでいます。従来の方法では、参照表現に最もよく一致する対象領域を選び出し、選択した領域からクエリ文を再構成するが、この再構成の違いが逆伝播の損失となる。しかし、既存の手法はマッチングの正確性が不明であるという事実を無視しているため、マッチングと再構成の両方を近似的に行っている。この制限を克服するために、ここではソリューションの基礎として識別トライアドが設計されています。これにより、クエリを非常にスケーラブルな方法で 1 つまたは複数の識別トライアドに変換できます。識別トライアドに基づいて、軽量でありながら弱い教師ありトレーニングに効果的なトライアドレベルのマッチングおよび再構築モジュールをさらに提案し、以前の最先端の方法よりも 3 倍軽量で高速になります。シンプルで端正なデザインでありながら、優れた性能を発揮することも当社の大きな特徴です。具体的には、提案された方法は、RefCOCO (39.21%)、RefCOCO+ (39.18%)、RefCOCOg (43.24%) のデータセットで評価された場合に、新しい最先端の精度を達成します。それぞれ前のもの。

In this paper, we are tackling the weakly-supervised referring expression grounding task, for the localization of a referent object in an image according to a query sentence, where the mapping between image regions and queries are not available during the training stage. In traditional methods, an object region that best matches the referring expression is picked out, and then the query sentence is reconstructed from the selected region, where the reconstruction difference serves as the loss for back-propagation. The existing methods, however, conduct both the matching and the reconstruction approximately as they ignore the fact that the matching correctness is unknown. To overcome this limitation, a discriminative triad is designed here as the basis to the solution, through which a query can be converted into one or multiple discriminative triads in a very scalable way. Based on the discriminative triad, we further propose the triad-level matching and reconstruction modules which are lightweight yet effective for the weakly-supervised training, making it three times lighter and faster than the previous state-of-the-art methods. One important merit of our work is its superior performance despite the simple and neat design. Specifically, the proposed method achieves a new state-of-the-art accuracy when evaluated on RefCOCO (39.21%), RefCOCO+ (39.18%) and RefCOCOg (43.24%) datasets, that is 4.17%, 4.08% and 7.8% higher than the previous one, respectively.

updated: Tue Jun 08 2021 02:15:11 GMT+0000 (UTC)

published: Tue Jun 08 2021 02:15:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト