Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings

Yue Wang; Jing Li; Michael R. Lyu; Irwin King

クロスメディアキーフレーズ予測：マルチモダリティマルチヘッドアテンションと画像表現を備えた統合フレームワーク

ソーシャルメディアは毎日大量のコンテンツを作成しています。ユーザーが必要なものをすばやく把握できるようにするために、キーフレーズ予測がますます注目を集めています。それにもかかわらず、これまでのほとんどの取り組みはテキストモデリングに焦点を当てており、一致する画像に埋め込まれている豊富な機能をほとんど無視しています。この作業では、マルチメディア投稿のキーフレーズを予測する際のテキストと画像の共同効果を調査します。ソーシャルメディアスタイルのテキストと画像をより適切に調整するために、次のことを提案します。（1）複雑なクロスメディアインタラクションをキャプチャするための新しいマルチモダリティマルチヘッドアテンション（M3H-Att）。（2）2つのモダリティを橋渡しするための、光学的文字と画像属性の形式の画像表現。さらに、キーフレーズの分類と生成の出力を活用し、それらの利点を組み合わせるための統合フレームワークを設計します。 Twitterから新たに収集された大規模なデータセットでの広範な実験は、私たちのモデルが従来の注意ネットワークに基づく以前の最先端技術を大幅に上回っていることを示しています。さらなる分析は、私たちのマルチヘッドアテンションがさまざまな側面からの情報に参加し、さまざまなシナリオで分類または生成を後押しできることを示しています。

Social media produces large amounts of contents every day. To help users quickly capture what they need, keyphrase prediction is receiving a growing attention. Nevertheless, most prior efforts focus on text modeling, largely ignoring the rich features embedded in the matching images. In this work, we explore the joint effects of texts and images in predicting the keyphrases for a multimedia post. To better align social media style texts and images, we propose: (1) a novel Multi-Modality Multi-Head Attention (M3H-Att) to capture the intricate cross-media interactions; (2) image wordings, in forms of optical characters and image attributes, to bridge the two modalities. Moreover, we design a unified framework to leverage the outputs of keyphrase classification and generation and couple their advantages. Extensive experiments on a large-scale dataset newly collected from Twitter show that our model significantly outperforms the previous state of the art based on traditional attention networks. Further analyses show that our multi-head attention is able to attend information from various aspects and boost classification or generation in diverse scenarios.

updated: Tue Nov 03 2020 08:44:18 GMT+0000 (UTC)

published: Tue Nov 03 2020 08:44:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト