Active Contrastive Learning of Audio-Visual Video Representations

Shuang Ma; Zhaoyang Zeng; Daniel McDuff; Yale Song

視聴覚ビデオ表現の能動的対照学習

対照学習は、インスタンスの異なるビュー間の相互情報量（MI）の下限を最大化することにより、オーディオおよびビジュアルデータの一般化可能な表現を生成することが示されています。ただし、厳密な下限を取得するには、MIで指数関数的なサンプルサイズが必要であるため、多数の負のサンプルが必要になります。大規模なキューベースのディクショナリを作成することで、より多くのサンプルを組み込むことができますが、負のサンプルが多数ある場合でも、パフォーマンスの向上には理論上の制限があります。ランダムなネガティブサンプリングは非常に冗長な辞書につながり、その結果、ダウンストリームタスクの表現が最適ではなくなると仮定します。この論文では、多様で有益な項目でアクティブにサンプリングされた辞書を構築するアクティブな対照学習アプローチを提案します。これにより、ネガティブサンプルの品質が向上し、データに相互情報量が多いタスク（ビデオ分類など）のパフォーマンスが向上します。私たちのモデルは、UCF101、HMDB51、ESC50などの挑戦的なオーディオおよびビジュアルダウンストリームベンチマークで最先端のパフォーマンスを実現します。コードはhttps://github.com/yunyikristy/CM-ACCで入手できます。

Contrastive learning has been shown to produce generalizable representations of audio and visual data by maximizing the lower bound on the mutual information (MI) between different views of an instance. However, obtaining a tight lower bound requires a sample size exponential in MI and thus a large set of negative samples. We can incorporate more samples by building a large queue-based dictionary, but there are theoretical limits to performance improvements even with a large number of negative samples. We hypothesize that random negative sampling leads to a highly redundant dictionary that results in suboptimal representations for downstream tasks. In this paper, we propose an active contrastive learning approach that builds an actively sampled dictionary with diverse and informative items, which improves the quality of negative samples and improves performances on tasks where there is high mutual information in the data, e.g., video classification. Our model achieves state-of-the-art performance on challenging audio and visual downstream benchmarks including UCF101, HMDB51 and ESC50.Code is available at: https://github.com/yunyikristy/CM-ACC

updated: Fri Apr 16 2021 22:16:18 GMT+0000 (UTC)

published: Mon Aug 31 2020 21:18:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト