Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Yake Wei; Di Hu; Yapeng Tian; Xuelong Li

視聴覚コンテキストでの学習: レビュー、分析、および新しい視点

視覚と聴覚は、人間のコミュニケーションと場面の理解において重要な役割を果たす 2 つの感覚です。人間の知覚能力を模倣するために、オーディオとビジュアルの両方のモダリティから学習するための計算アプローチの開発を目的としたオーディオビジュアル学習は、近年盛んな分野です。視聴覚分野の研究を体系的に整理・分析できる総合的な調査が期待されます。視聴覚認知基盤の分析から始めて、計算研究に影響を与えたいくつかの重要な発見を紹介します。次に、最近の視聴覚学習研究を体系的に見直し、視聴覚ブースティング、クロスモーダル知覚、視聴覚コラボレーションの 3 つのカテゴリに分類します。私たちの分析を通じて、セマンティック、空間、時間にわたる視聴覚データの一貫性が上記の研究をサポートしていることを発見しました。視聴覚学習分野の現在の発展をよりマクロな視点から再検討するために、視聴覚シーンの理解に関する新しい視点をさらに提案し、視聴覚学習分野の実現可能な将来の方向性について議論し、分析します。全体として、この調査では、現在の視聴覚学習分野をさまざまな側面からレビューし、展望しています。研究者がこの分野をよりよく理解できるようになることを願っています。定期的に更新される調査を含む Web サイトが公開されています: https://gewu-lab.github.io/audio-visual-learning/。

Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: https://gewu-lab.github.io/audio-visual-learning/.

updated: Sat Aug 20 2022 02:15:44 GMT+0000 (UTC)

published: Sat Aug 20 2022 02:15:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト