Seeing the Intangible: Surveying Automatic High-Level Visual Understanding from Still Images

Delfina Sol Martinez Pandiani; Valentina Presutti

無形のものを見る：静止画像から自動的に高レベルの視覚的理解を測量する

コンピュータービジョン (CV) の分野は、入力画像の完全な意味解釈を提供するという、完全な画像理解という 1 つの大きな目標を持って生まれました。この目標が正確に何を意味するのかは、すぐには簡単ではありませんが、視覚理解の理論的階層は完全なセマンティクスの最上位レベルを指しており、その中には人間が視覚データから検出できる最も複雑で主観的な情報が存在します。特に、感情、社会的価値観、イデオロギーなどの非具体的な概念が、この「高レベルの」視覚的意味理解の主人公であると思われます。このような「抽象概念」は画像の管理と検索にとって重要なツールですが、それらの自動認識は依然として課題です。それは、まさにそれらが「意味論的ピラミッド」の頂点に位置しているためです。よく知られている意味論的ギャップの問題は、これらの概念が欠落していることを考えるとさらに悪化しています。ユニークな知覚指示対象と、具体的な概念よりも不特定な特徴への依存。抽象社会概念 (ASC) 検出のタスクに関する CV 内での明示的な研究が非常に少ないように思われること、および最近の研究の多くが異なる用語を使用して同様の非具体的なエンティティについて議論しているように見えることを考慮して、この調査では体系的なレビューを提供します。静止画像からの抽象的な (特に社会的な) 概念の検出の問題に明示的または暗黙的にアプローチする CV 作品。具体的には、この調査は以下を実行および提供します。(1) 学際的な観点 (コンピューターサイエンス、視覚研究、および認知的観点) からの高レベルの視覚理解意味論的要素の研究とクラスタリング。 (2) AC 検出を暗黙的に扱う現在の CV 作業を特定するために、特定された意味論的要素を扱う高レベルの視覚的理解コンピュータビジョンタスクの研究とクラスタリング。

The field of Computer Vision (CV) was born with the single grand goal of complete image understanding: providing a complete semantic interpretation of an input image. What exactly this goal entails is not immediately straightforward, but theoretical hierarchies of visual understanding point towards a top level of full semantics, within which sits the most complex and subjective information humans can detect from visual data. In particular, non-concrete concepts including emotions, social values and ideologies seem to be protagonists of this "high-level" visual semantic understanding. While such "abstract concepts" are critical tools for image management and retrieval, their automatic recognition is still a challenge, exactly because they rest at the top of the "semantic pyramid": the well-known semantic gap problem is worsened given their lack of unique perceptual referents, and their reliance on more unspecific features than concrete concepts. Given that there seems to be very scarce explicit work within CV on the task of abstract social concept (ASC) detection, and that many recent works seem to discuss similar non-concrete entities by using different terminology, in this survey we provide a systematic review of CV work that explicitly or implicitly approaches the problem of abstract (specifically social) concept detection from still images. Specifically, this survey performs and provides: (1) A study and clustering of high level visual understanding semantic elements from a multidisciplinary perspective (computer science, visual studies, and cognitive perspectives); (2) A study and clustering of high level visual understanding computer vision tasks dealing with the identified semantic elements, so as to identify current CV work that implicitly deals with AC detection.

updated: Mon Aug 21 2023 08:37:04 GMT+0000 (UTC)

published: Mon Aug 21 2023 08:37:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト