Suspected Object Matters: Rethinking Model's Prediction for One-stage Visual Grounding

Yang Jiao; Zequn Jie; Jingjing Chen; Lin Ma; Yu-Gang Jiang

疑わしい物体の重要性: 1 段階の視覚的グラウンディングに関するモデルの予測を再考する

最近、1 段式ビジュアルグラウンダーが、2 段式グラウンダーに比べて精度が同等でありながら効率が大幅に高いため、高い注目を集めています。ただし、オブジェクト間の関係モデリングは、1 段階グラウンダーについては十分に研究されていません。オブジェクト間の関係モデリングは重要ですが、一部のオブジェクトのみがテキストクエリに関連しており、モデルが混乱する可能性があるため、必ずしもすべてのオブジェクト間で実行されるわけではありません。これらのオブジェクトを疑わしいオブジェクトと呼びます。ただし、1 段階パラダイムでそれらの関係を探索することは簡単ではありません。その理由は次のとおりです。まず、疑わしいオブジェクトを選択し、関係モデリングを実行するための基礎として使用できるオブジェクトの提案がない。第 2 に、疑わしいオブジェクトは、同様のセマンティクスを共有したり、特定の関係と絡み合ったりする可能性があるため、他のオブジェクトよりも混乱しやすく、そのためモデルの予測を誤解しやすくなります。この目的に向けて、我々は、既存の CNN および Transformer ベースの 1 段階ビジュアルグラウンダーにシームレスに統合できる、疑わしいオブジェクトの中からターゲットオブジェクトの選択を促進する、疑わしいオブジェクト変換メカニズム (SOT) を提案します。疑わしいオブジェクトは、トレーニング中のモデルの現在の識別能力に適応した学習済みの活性化マップから動的に発見されます。その後、疑わしいオブジェクトに加えて、キーワード認識判別モジュール (KAD) とランダム接続による探索戦略 (ERC) が同時に提案され、モデルが最初の予測を再考するのを支援します。一方で、KAD は、疑わしいオブジェクトの識別に大きく寄与するキーワードを活用します。一方、ERC を使用すると、モデルは現在の誤った予測を常に悪用する状況に陥るのではなく、正しいオブジェクトを探すことができます。広範な実験により、私たちが提案した方法の有効性が実証されています。

Recently, one-stage visual grounders attract high attention due to their comparable accuracy but significantly higher efficiency than two-stage grounders. However, inter-object relation modeling has not been well studied for one-stage grounders. Inter-object relationship modeling, though important, is not necessarily performed among all objects, as only part of them are related to the text query and may confuse the model. We call these objects suspected objects. However, exploring their relationships in the one-stage paradigm is non-trivial because: First, no object proposals are available as the basis on which to select suspected objects and perform relationship modeling. Second, suspected objects are more confusing than others, as they may share similar semantics, be entangled with certain relationships, etc, and thereby more easily mislead the model prediction. Toward this end, we propose a Suspected Object Transformation mechanism (SOT), which can be seamlessly integrated into existing CNN and Transformer-based one-stage visual grounders to encourage the target object selection among the suspected ones. Suspected objects are dynamically discovered from a learned activation map adapted to the model current discrimination ability during training. Afterward, on top of suspected objects, a Keyword-Aware Discrimination module (KAD) and an Exploration by Random Connection strategy (ERC) are concurrently proposed to help the model rethink its initial prediction. On the one hand, KAD leverages keywords contributing high to suspected object discrimination. On the other hand, ERC allows the model to seek the correct object instead of being trapped in a situation that always exploits the current false prediction. Extensive experiments demonstrate the effectiveness of our proposed method.

updated: Mon Aug 21 2023 10:31:12 GMT+0000 (UTC)

published: Thu Mar 10 2022 06:41:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト