Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding

Stephen Su; Samuel Kwong; Qingyu Zhao; De-An Huang; Juan Carlos Niebles; Ehsan Adeli

敵対的マルチタスクビデオ理解に必要な条件分析を使用した補助的または敵対的タスクの特定

近年、映像理解のためのマルチタスク学習への関心が高まっています。この作業では、モデルがうまく実行する必要がある補助タスクと、モデルがうまく実行できない敵対的タスクの両方を組み込むことにより、マルチタスク学習の一般化された概念を提案します。これらのタスクがどのカテゴリに分類されるかを決定するためのデータ駆動型アプローチとして、必要条件分析 (NCA) を採用しています。私たちが提案する新しいフレームワークである Adversarial Multi-Task Neural Networks (AMT) は、NCA によってシーン認識であると判断された敵対的タスクにペナルティを課します。全体的なビデオの理解 (HVU) データセットで、アクション認識を改善します。これは、モデルがマルチタスク学習のすべてのタスクでうまくいくように常に奨励されるべきであるという一般的な仮定を覆します。同時に、AMT はマルチタスク学習のすべての利点を既存の方法の一般化として保持し、オブジェクト認識を補助タスクとして使用してアクション認識を支援します。 HVU の 2 つの挑戦的な Scene-Invariant テスト分割を紹介します。ここでは、トレーニングで遭遇しないアクションシーンの共起でモデルが評価されます。私たちのアプローチにより、精度が最大 3% 向上し、モデルが相関バイアスのシーン機能ではなくアクション機能に注意を向けるようになることがわかりました。

There has been an increasing interest in multi-task learning for video understanding in recent years. In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on. We employ Necessary Condition Analysis (NCA) as a data-driven approach for deciding what category these tasks should fall in. Our novel proposed framework, Adversarial Multi-Task Neural Networks (AMT), penalizes adversarial tasks, determined by NCA to be scene recognition in the Holistic Video Understanding (HVU) dataset, to improve action recognition. This upends the common assumption that the model should always be encouraged to do well on all tasks in multi-task learning. Simultaneously, AMT still retains all the benefits of multi-task learning as a generalization of existing methods and uses object recognition as an auxiliary task to aid action recognition. We introduce two challenging Scene-Invariant test splits of HVU, where the model is evaluated on action-scene co-occurrences not encountered in training. We show that our approach improves accuracy by ~3% and encourages the model to attend to action features instead of correlation-biasing scene features.

updated: Mon Aug 22 2022 06:26:11 GMT+0000 (UTC)

published: Mon Aug 22 2022 06:26:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト