Multimodal Deep Learning Framework for Image Popularity Prediction on Social Media

Fatma S. Abousaleh; Wen-Huang Cheng; Neng-Hao Yu; Yu Tsao

ソーシャルメディアでの画像人気予測のためのマルチモーダルディープラーニングフレームワーク

数十億枚の写真が、さまざまな種類のソーシャルネットワークを介して毎日Webにアップロードされています。これらの画像の中には、何百万ものビューを受け取り人気を博しているものもあれば、まったく気づかれていなかったものもあります。これは、ソーシャルメディアでの画像の人気を予測するという問題を引き起こします。画像の人気は、視覚的なコンテンツ、美的品質、ユーザー、投稿のメタデータ、時間など、いくつかの要因の影響を受ける可能性があります。したがって、これらすべての要因を考慮することは、画像の人気を正確に予測するために不可欠です。さらに、予測モデルの効率も重要な役割を果たします。本研究では、さまざまなモダリティからの情報を使用するマルチモーダル学習と、さまざまな分野での畳み込みニューラルネットワーク（CNN）の現在の成功を動機として、視覚社会畳み込みニューラルネットワーク（VSCNN）と呼ばれる深層学習モデルを提案します。さまざまなタイプの視覚的および社会的機能を統合ネットワークモデルに組み込むことにより、投稿された画像の人気を予測します。 VSCNNはまず、2つの個別のCNNを利用して、入力された視覚的および社会的特徴から高レベルの表現を抽出することを学習します。次に、これら2つのネットワークの出力を結合ネットワークに融合して、出力層の人気スコアを推定します。 Flickrに投稿された約432Kの画像のデータセットに対して広範な実験を行うことにより、提案された方法のパフォーマンスを評価します。シミュレーション結果は、提案されたVSCNNモデルが最先端のモデルを大幅に上回り、スピアマンのRho、平均絶対誤差、および平均二乗誤差に関して2.33％、7.59％、および14.16％を超える相対的な改善を示しています。、それぞれ。

Billions of photos are uploaded to the web daily through various types of social networks. Some of these images receive millions of views and become popular, whereas others remain completely unnoticed. This raises the problem of predicting image popularity on social media. The popularity of an image can be affected by several factors, such as visual content, aesthetic quality, user, post metadata, and time. Thus, considering all these factors is essential for accurately predicting image popularity. In addition, the efficiency of the predictive model also plays a crucial role. In this study, motivated by multimodal learning, which uses information from various modalities, and the current success of convolutional neural networks (CNNs) in various fields, we propose a deep learning model, called visual-social convolutional neural network (VSCNN), which predicts the popularity of a posted image by incorporating various types of visual and social features into a unified network model. VSCNN first learns to extract high-level representations from the input visual and social features by utilizing two individual CNNs. The outputs of these two networks are then fused into a joint network to estimate the popularity score in the output layer. We assess the performance of the proposed method by conducting extensive experiments on a dataset of approximately 432K images posted on Flickr. The simulation results demonstrate that the proposed VSCNN model significantly outperforms state-of-the-art models, with a relative improvement of greater than 2.33%, 7.59%, and 14.16% in terms of Spearman's Rho, mean absolute error, and mean squared error, respectively.

updated: Tue May 18 2021 19:58:58 GMT+0000 (UTC)

published: Tue May 18 2021 19:58:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト