Understanding of Emotion Perception from Art

Digbalay Bose; Krishna Somandepalli; Souvik Kundu; Rimita Lahiri; Jonathan Gratch; Shrikanth Narayanan

アートからの感情知覚の理解

人間の芸術によって引き起こされる感情の計算モデリングは、芸術の主観的で微妙な性質と感情的な信号のために挑戦的な問題です。本稿では、テキストと視覚モダリティの両方を使用したアートワークによって視聴者に引き起こされる感情を理解するという上記の問題について考察します。具体的には、マルチモーダル分類タスクとして感情を表現する視聴者からの画像とそれに付随するテキストキャプションを分析します。私たちの結果は、MMBTやVisualBERTのようなシングルストリームマルチモーダルトランスベースモデルは、画像のみのモデルと、テキストと画像モダリティに別々の経路を持つデュアルストリームマルチモーダルモデルの両方と比較して、パフォーマンスが優れていることを示しています。また、MMBTのようなシングルストリームモデルをBERTのようなテキストのみのトランスフォーマーモデルと比較すると、極端なポジティブおよびネガティブな感情クラスのパフォーマンスが向上することがわかります。

Computational modeling of the emotions evoked by art in humans is a challenging problem because of the subjective and nuanced nature of art and affective signals. In this paper, we consider the above-mentioned problem of understanding emotions evoked in viewers by artwork using both text and visual modalities. Specifically, we analyze images and the accompanying text captions from the viewers expressing emotions as a multimodal classification task. Our results show that single-stream multimodal transformer-based models like MMBT and VisualBERT perform better compared to both image-only models and dual-stream multimodal models having separate pathways for text and image modalities. We also observe improvements in performance for extreme positive and negative emotion classes, when a single-stream model like MMBT is compared with a text-only transformer model like BERT.

updated: Wed Oct 13 2021 04:14:49 GMT+0000 (UTC)

published: Wed Oct 13 2021 04:14:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト