Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Jaehui Hwang; Huan Zhang; Jun-Ho Choi; Cho-Jui Hsieh; Jong-Seok Lee

敵対的攻撃からディープアクション認識モデルを防御するための時間的シャッフル

最近、畳み込みニューラルネットワーク（CNN）を使用したビデオベースのアクション認識方法により、優れた認識パフォーマンスが実現されています。ただし、アクション認識モデルの一般化メカニズムについてはまだ理解が不足しています。この論文では、行動認識モデルが予想よりも少ない動き情報に依存しているため、フレーム次数のランダム化に対してロバストであることを提案します。この観察に基づいて、アクション認識モデルの敵対的攻撃に対する入力ビデオの一時的なシャッフルを使用する新しい防御方法を開発します。私たちの防御方法を可能にする別の観察は、ビデオの敵対的な摂動が一時的な破壊に敏感であるということです。私たちの知る限り、これはビデオベースのアクション認識モデルに固有の防御方法を設計する最初の試みです。

Recently, video-based action recognition methods using convolutional neural networks (CNNs) achieve remarkable recognition performance. However, there is still lack of understanding about the generalization mechanism of action recognition models. In this paper, we suggest that action recognition models rely on the motion information less than expected, and thus they are robust to randomization of frame orders. Based on this observation, we develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. Another observation enabling our defense method is that adversarial perturbations on videos are sensitive to temporal destruction. To the best of our knowledge, this is the first attempt to design a defense method specific to video-based action recognition models.

updated: Wed Dec 15 2021 06:57:01 GMT+0000 (UTC)

published: Wed Dec 15 2021 06:57:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト