Transforming Visual Scene Graphs to Image Captions

Xu Yang; Jiawei Peng; Zihua Wang; Haiyang Xu; Qinghao Ye; Chenliang Li; Ming Yan; Fei Huang; Zhangzikang Li; Yu Zhang

ビジュアルシーングラフを画像キャプションに変換する

シーングラフ (TSG) をより説明的なキャプションに変換することを提案します。 TSG では、マルチヘッドアテンション (MHA) を適用して、シーングラフを埋め込むためのグラフニューラルネットワーク (GNN) を設計します。埋め込み後、さまざまなグラフ埋め込みには、さまざまな品詞の単語を生成するためのさまざまな固有の知識が含まれます。たとえば、オブジェクト/属性の埋め込みは、名詞/形容詞の生成に適しています。これに動機付けられて、グラフの埋め込みを識別してさまざまな種類の単語を生成するために、各エキスパートが MHA に基づいて構築されている、Mixture-of-Expert (MOE) ベースのデコーダーを設計します。エンコーダーとデコーダーの両方が MHA に基づいて構築されているため、結果として、完全接続ベースの GNN と LSTM ベースのデコーダーを通常適用する以前の異種エンコーダーとは異なり、同種のエンコーダー/デコーダーを構築します。同種アーキテクチャにより、異種パイプラインのように多様なサブネットワークに異なるトレーニング戦略を指定する代わりに、モデル全体のトレーニング構成を統一することができ、トレーニングの難しさが解消されます。 MS-COCO キャプションベンチマークでの広範な実験により、TSG の有効性が検証されました。コードは https://anonymous.4open.science/r/ACL23_TSG にあります。

We propose to Transform Scene Graphs (TSG) into more descriptive captions. In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs. After embedding, different graph embeddings contain diverse specific knowledge for generating the words with different part-of-speech, e.g., object/attribute embedding is good for generating nouns/adjectives. Motivated by this, we design a Mixture-of-Expert (MOE)-based decoder, where each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words. Since both the encoder and decoder are built based on the MHA, as a result, we construct a homogeneous encoder-decoder unlike the previous heterogeneous ones which usually apply Fully-Connected-based GNN and LSTM-based decoder. The homogeneous architecture enables us to unify the training configuration of the whole model instead of specifying different training strategies for diverse sub-networks as in the heterogeneous pipeline, which releases the training difficulty. Extensive experiments on the MS-COCO captioning benchmark validate the effectiveness of our TSG. The code is in: https://anonymous.4open.science/r/ACL23_TSG.

updated: Wed May 03 2023 15:18:37 GMT+0000 (UTC)

published: Wed May 03 2023 15:18:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト