Audio-Visual Fusion Layers for Event Type Aware Video Recognition

Arda Senocak; Junsik Kim; Tae-Hyun Oh; Hyeonggon Ryu; Dingzeyu Li; In So Kweon

イベントタイプを意識したビデオ認識のための視聴覚融合レイヤー

人間の脳は、いつでも外界から来る多感覚情報とそれらの複雑な相互作用で絶えず氾濫しています。このような情報は、私たちの脳内で結合または分離することによって自動的に分析されます。このタスクは人間の脳にとっては簡単に思えるかもしれませんが、複雑な相互作用は単一のタイプの統合では処理できず、より高度なアプローチが必要になるため、同様のタスクを実行できるマシンを構築することは非常に困難です。この論文では、マルチタスク学習スキームにおける個々のイベント固有のレイヤーとの多感覚統合問題に対処するための新しいモデルを提案します。単一のタイプの融合が使用される以前の作品とは異なり、私たちはさまざまな視聴覚関係タスクを処理するためにイベント固有のレイヤーを設計し、視聴覚形成のさまざまな方法を可能にします。実験結果は、私たちのイベント固有のレイヤーがビデオの視聴覚関係のユニークな特性を発見できることを示しています。さらに、私たちのネットワークはシングルラベルで構成されていますが、特定のビデオを表すために追加の真のマルチラベルを出力できます。提案されたフレームワークは、人気のあるベンチマークデータセットのカテゴリごとおよびデータセットごとの方法でビデオデータのモダリティバイアスも公開することを示します。

Human brain is continuously inundated with the multisensory information and their complex interactions coming from the outside world at any given moment. Such information is automatically analyzed by binding or segregating in our brain. While this task might seem effortless for human brains, it is extremely challenging to build a machine that can perform similar tasks since complex interactions cannot be dealt with single type of integration but requires more sophisticated approaches. In this paper, we propose a new model to address the multisensory integration problem with individual event-specific layers in a multi-task learning scheme. Unlike previous works where single type of fusion is used, we design event-specific layers to deal with different audio-visual relationship tasks, enabling different ways of audio-visual formation. Experimental results show that our event-specific layers can discover unique properties of the audio-visual relationships in the videos. Moreover, although our network is formulated with single labels, it can output additional true multi-labels to represent the given videos. We demonstrate that our proposed framework also exposes the modality bias of the video data category-wise and dataset-wise manner in popular benchmark datasets.

updated: Sat Feb 12 2022 02:56:22 GMT+0000 (UTC)

published: Sat Feb 12 2022 02:56:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト