Feature representations useful for predicting image memorability

Takumi Harada; Hiroyuki Sakai

画像の記憶に残りやすいものを予測するのに役立つ特徴表現

画像の記憶性の予測はさまざまな分野で注目を集めています。その結果、畳み込みニューラルネットワーク (CNN) モデルの予測精度は、人間の一貫性に基づいて推定される経験的な上限に近づいています。ただし、CNN モデルに埋め込まれたどの特徴表現が高い記憶性予測精度の原因となっているかを特定することは未解決の問題のままです。この問題に取り組むために、私たちは脳の類似性を使用して CNN モデルにおける記憶性に関連する特徴表現を特定しようとしました。具体的には、物体認識用に事前トレーニングされた 64 個の CNN モデルの 16,860 レイヤーにわたって、記憶力の予測精度と脳の類似性が検査されました。この包括的な分析では、記憶力の予測精度が高い層は、腹側視覚経路の最上位段階である下側頭（IT）皮質との脳の類似性が高いという明らかな傾向が観察されました。さらに、記憶力予測のために 64 の CNN モデルを微調整したところ、最後から 2 番目の層にある IT 皮質との脳の類似性が、モデルの記憶力予測精度と正の相関があることが明らかになりました。この分析では、最適な微調整モデルが記憶力予測用に開発された最先端の CNN モデルに匹敵する精度を提供することも示しました。全体として、この研究の結果は、記憶力の予測における CNN モデルの大きな成功は、IT 皮質と同様に、特徴表現の獲得に依存していることを示しました。この研究により、特徴の表現と、画像の記憶に残る性を予測する際のその使用についての理解が深まりました。

Prediction of image memorability has attracted interest in various fields. Consequently, the prediction accuracy of convolutional neural network (CNN) models has been approaching the empirical upper bound estimated based on human consistency. However, identifying which feature representations embedded in CNN models are responsible for the high memorability prediction accuracy remains an open question. To tackle this problem, we sought to identify memorability-related feature representations in CNN models using brain similarity. Specifically, memorability prediction accuracy and brain similarity were examined across 16,860 layers in 64 CNN models pretrained for object recognition. A clear tendency was observed in this comprehensive analysis that layers with high memorability prediction accuracy had higher brain similarity with the inferior temporal (IT) cortex, which is the highest stage in the ventral visual pathway. Furthermore, fine-tuning of the 64 CNN models for memorability prediction revealed that brain similarity with the IT cortex at the penultimate layer positively correlated with the memorability prediction accuracy of the models. This analysis also showed that the best fine-tuned model provided accuracy comparable to state-of-the-art CNN models developed for memorability prediction. Overall, the results of this study indicated that the CNN models' great success in predicting memorability relies on feature representation acquisition, similar to the IT cortex. This study advances our understanding of feature representations and their use in predicting image memorability.

updated: Mon Jul 17 2023 03:41:26 GMT+0000 (UTC)

published: Tue Mar 14 2023 07:42:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト