Perception Over Time: Temporal Dynamics for Robust Image Understanding

Maryam Daniali; Edward Kim

時間の経過に伴う知覚：ロバストな画像理解のための時間的ダイナミクス

ディープラーニングは、狭くて特定のビジョンタスクで人間レベルのパフォーマンスを上回りますが、分類には脆弱で自信過剰です。たとえば、画像空間での遠近法、照明、またはオブジェクトの変形のわずかな変化により、ラベル付けが大幅に異なる可能性があります。これは、敵対的な摂動によって特に透明になります。一方、人間の視覚は、入力刺激の変化に対して桁違いに堅牢です。しかし残念ながら、私たちはそのような堅固な認識をもたらす根本的なメカニズムを完全に理解して統合することにはほど遠いです。この作品では、静止画像の理解に時間的ダイナミクスを組み込む新しい方法を紹介します。単一の画像を一連の粗い画像から細かい画像に分解し、生物学的視覚が時間の経過とともに情報を統合する方法をシミュレートする神経に触発された方法について説明します。次に、新しい視覚認識フレームワークが、反復ユニットを備えた生物学的にもっともらしいアルゴリズムを使用してこの情報を「時間の経過とともに」利用し、その結果、標準のCNNよりも精度と堅牢性を大幅に向上させる方法を示します。また、提案されたアプローチを最先端のモデルと比較し、複数のアブレーション研究を通じて敵対的なロバスト性の特性を明示的に定量化します。私たちの定量的および定性的な結果は、今日使用されている標準的なコンピュータービジョンおよびディープラーニングアーキテクチャに対する刺激的で変革的な改善を説得力を持って示しています。

While deep learning surpasses human-level performance in narrow and specific vision tasks, it is fragile and over-confident in classification. For example, minor transformations in perspective, illumination, or object deformation in the image space can result in drastically different labeling, which is especially transparent via adversarial perturbations. On the other hand, human visual perception is orders of magnitude more robust to changes in the input stimulus. But unfortunately, we are far from fully understanding and integrating the underlying mechanisms that result in such robust perception. In this work, we introduce a novel method of incorporating temporal dynamics into static image understanding. We describe a neuro-inspired method that decomposes a single image into a series of coarse-to-fine images that simulates how biological vision integrates information over time. Next, we demonstrate how our novel visual perception framework can utilize this information "over time" using a biologically plausible algorithm with recurrent units, and as a result, significantly improving its accuracy and robustness over standard CNNs. We also compare our proposed approach with state-of-the-art models and explicitly quantify our adversarial robustness properties through multiple ablation studies. Our quantitative and qualitative results convincingly demonstrate exciting and transformative improvements over the standard computer vision and deep learning architectures used today.

updated: Fri Mar 11 2022 21:11:59 GMT+0000 (UTC)

published: Fri Mar 11 2022 21:11:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト