Peripheral Vision Transformer

Juhong Min; Yucheng Zhao; Chong Luo; Minsu Cho

周辺視野トランス

人間の視覚には、周辺視野と呼ばれる特別な種類の視覚処理システムがあります。周辺視野は、視線の中心までの距離に基づいて視野全体を複数の輪郭領域に分割することで、さまざまな領域でさまざまな視覚的特徴を知覚する能力を提供します。この作業では、生物学に着想を得たアプローチを採用し、視覚認識のためにディープニューラルネットワークで周辺視野をモデル化する方法を探ります。マルチヘッド自己注意レイヤーに周辺位置エンコーディングを組み込み、トレーニングデータが与えられた場合に、ネットワークが視野を多様な周辺領域に分割することを学習できるようにすることを提案します。 PerViT と呼ばれる提案されたネットワークを ImageNet-1K で評価し、機械知覚モデルの内部動作を体系的に調査して、ネットワークが人間の視覚と同様に視覚データを知覚することを学習することを示します。さまざまなモデルサイズにわたるベースラインでの画像分類のパフォーマンスの向上は、提案された方法の有効性を示しています。

Human vision possesses a special type of visual processing systems called peripheral vision. Partitioning the entire visual field into multiple contour regions based on the distance to the center of our gaze, the peripheral vision provides us the ability to perceive various visual features at different regions. In this work, we take a biologically inspired approach and explore to model peripheral vision in deep neural networks for visual recognition. We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data. We evaluate the proposed network, dubbed PerViT, on ImageNet-1K and systematically investigate the inner workings of the model for machine perception, showing that the network learns to perceive visual data similarly to the way that human vision does. The performance improvements in image classification over the baselines across different model sizes demonstrate the efficacy of the proposed method.

updated: Thu Oct 13 2022 12:08:14 GMT+0000 (UTC)

published: Tue Jun 14 2022 12:47:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト