Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Yufeng Huang; Zhuo Chen; Wen Zhang; Jiaoyan Chen; Jeff Z. Pan; Zhen Yao; Yujie Xie; Huajun Chen

シーケンシャルクロスモーダルセマンティックグラフによるアスペクトベースの感情分類

マルチモーダルアスペクトベースのセンチメント分類 (MABSC) は、データ内の言及されたエンティティなどの特定のターゲットのセンチメントをさまざまなモダリティで分類することを目的とした新しい分類タスクです。テキストと画像を含む典型的なマルチモーダルデータでは、以前のアプローチは、特にテキストのセマンティクスと組み合わせて、画像のきめの細かいセマンティクスを十分に活用しておらず、きめの細かい画像間の関係のモデル化を十分に考慮していません。これは、画像の使用が不十分であり、詳細な側面や意見を特定するのに不十分です。これらの制限に取り組むために、シーケンシャルクロスモーダルセマンティックグラフとエンコーダーデコーダーモデルを構築する方法を含む新しいフレームワーク SeqCSG を提案します。具体的には、元の画像、画像キャプション、シーングラフからきめ細かい情報を抽出し、それらをクロスモーダルセマンティックグラフの要素およびテキストからのトークンと見なします。クロスモーダルセマンティックグラフは、要素間の関係を示すマルチモーダル可視マトリックスを含むシーケンスとして表されます。クロスモーダルセマンティックグラフを効果的に利用するために、ターゲットプロンプトテンプレートを使用したエンコーダーデコーダー法を提案します。実験結果は、私たちのアプローチが既存の方法よりも優れており、2 つの標準データセット MABSC で最先端を達成することを示しています。さらなる分析により、各コンポーネントの有効性が実証され、モデルはターゲットと画像の詳細な情報との相関関係を暗黙的に学習できます。

Multi-modal aspect-based sentiment classification (MABSC) is an emerging classification task that aims to classify the sentiment of a given target such as a mentioned entity in data with different modalities. In typical multi-modal data with text and image, previous approaches do not make full use of the fine-grained semantics of the image, especially in conjunction with the semantics of the text and do not fully consider modeling the relationship between fine-grained image information and target, which leads to insufficient use of image and inadequate to identify fine-grained aspects and opinions. To tackle these limitations, we propose a new framework SeqCSG including a method to construct sequential cross-modal semantic graphs and an encoder-decoder model. Specifically, we extract fine-grained information from the original image, image caption, and scene graph, and regard them as elements of the cross-modal semantic graph as well as tokens from texts. The cross-modal semantic graph is represented as a sequence with a multi-modal visible matrix indicating relationships between elements. In order to effectively utilize the cross-modal semantic graph, we propose an encoder-decoder method with a target prompt template. Experimental results show that our approach outperforms existing methods and achieves the state-of-the-art on two standard datasets MABSC. Further analysis demonstrates the effectiveness of each component and our model can implicitly learn the correlation between the target and fine-grained information of the image.

updated: Fri Aug 19 2022 16:04:29 GMT+0000 (UTC)

published: Fri Aug 19 2022 16:04:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト