Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations

Meng-Jiun Chiou; Roger Zimmermann; Jiashi Feng

マルチモーダル表現からの視覚言語知識による視覚関係の検出

視覚的関係の検出は、過去数年間でますます注目を集めている画像内の顕著なオブジェクト間の関係を推論することを目的としています。人間の推論メカニズムに触発されて、外部の視覚的常識知識は、画像内のオブジェクトの視覚的関係を推論するのに有益であると考えられていますが、既存の方法ではほとんど考慮されていません。この論文では、トランスフォーマーからのリレーショナル視覚言語双方向エンコーダー表現（RVL-BERT）という名前の新しいアプローチを提案します。これは、マルチモーダル表現を使用した自己教師あり事前トレーニングを通じて学習した視覚と言語の常識知識の両方を使用してリレーショナル推論を実行します。 RVL-BERTは、効果的な空間モジュールと新しいマスクアテンションモジュールを使用して、オブジェクト間の空間情報を明示的にキャプチャします。さらに、私たちのモデルは、オブジェクト名を直接取り込むことにより、オブジェクト検出を視覚的関係認識から切り離し、あらゆるオブジェクト検出システム上で使用できるようにします。定量的および定性的な実験を通じて、伝達された知識と新しいモジュールを使用して、RVL-BERTが2つの挑戦的な視覚的関係検出データセットで競争力のある結果を達成することを示します。ソースコードはhttps://github.com/coldmanck/RVL-BERTで入手できます。

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanisms, it is believed that external visual commonsense knowledge is beneficial for reasoning visual relationships of objects in images, which is however rarely considered in existing methods. In this paper, we propose a novel approach named Relational Visual-Linguistic Bidirectional Encoder Representations from Transformers (RVL-BERT), which performs relational reasoning with both visual and language commonsense knowledge learned via self-supervised pre-training with multimodal representations. RVL-BERT also uses an effective spatial module and a novel mask attention module to explicitly capture spatial information among the objects. Moreover, our model decouples object detection from visual relationship recognition by taking in object names directly, enabling it to be used on top of any object detection system. We show through quantitative and qualitative experiments that, with the transferred knowledge and novel modules, RVL-BERT achieves competitive results on two challenging visual relationship detection datasets. The source code is available at https://github.com/coldmanck/RVL-BERT.

updated: Mon Apr 05 2021 07:48:10 GMT+0000 (UTC)

published: Thu Sep 10 2020 16:15:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト