Audio-Visual Class-Incremental Learning

Weiguo Pian; Shentong Mo; Yunhui Guo; Yapeng Tian

視聴覚クラス - 段階的な学習

この論文では、視聴覚ビデオ認識のためのクラス増分学習シナリオである視聴覚クラス増分学習を紹介します。我々は、共同視聴覚モデリングがクラスの増分学習を改善できることを実証しますが、現在の方法では、増分ステップが増加するにつれて、音声と視覚の特徴間の意味論的な類似性を維持できません。さらに、以前のタスクで学習した視聴覚の相関関係は、ステップが段階的に進むにつれて忘れられ、パフォーマンスの低下につながる可能性があることが観察されています。これらの課題を克服するために、私たちは AV-CIL を提案します。これは、オーディオビジュアルモダリティとビジュアルアテンション蒸留 (VAD) の間のインスタンス認識とクラス認識の両方の意味的類似性を維持するデュアルオーディオビジュアル類似性制約 (D-AVSC) を組み込んでいます。以前に学習した音声ガイドによる視覚的な注意力を維持します。 AVE に基づいて、AVE-Class-Incremental (AVE-CI)、Kinetics-Sounds-Class-Incremental (KS-CI)、および VGGSound100-Class-Incremental (VS100-CI) という 3 つのオーディオビジュアルクラス増分データセットを作成します。、Kinetics-Sounds、および VGGSound データセットです。 AVE-CI、KS-CI、VS100-CI での実験では、AV-CIL が視聴覚クラス増分学習において既存のクラス増分学習方法よりも大幅に優れていることが実証されました。コードとデータは https://github.com/weiguoPian/AV-CIL_ICCV2023 で入手できます。

In this paper, we introduce audio-visual class-incremental learning, a class-incremental learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows. Furthermore, we observe that audio-visual correlations learned in previous tasks can be forgotten as incremental steps progress, leading to poor performance. To overcome these challenges, we propose AV-CIL, which incorporates Dual-Audio-Visual Similarity Constraint (D-AVSC) to maintain both instance-aware and class-aware semantic similarity between audio-visual modalities and Visual Attention Distillation (VAD) to retain previously learned audio-guided visual attentive ability. We create three audio-visual class-incremental datasets, AVE-Class-Incremental (AVE-CI), Kinetics-Sounds-Class-Incremental (K-S-CI), and VGGSound100-Class-Incremental (VS100-CI) based on the AVE, Kinetics-Sounds, and VGGSound datasets, respectively. Our experiments on AVE-CI, K-S-CI, and VS100-CI demonstrate that AV-CIL significantly outperforms existing class-incremental learning methods in audio-visual class-incremental learning. Code and data are available at: https://github.com/weiguoPian/AV-CIL_ICCV2023.

updated: Wed Sep 27 2023 10:16:09 GMT+0000 (UTC)

published: Mon Aug 21 2023 22:43:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト