DarSwin: Distortion Aware Radial Swin Transformer

Akshaya Athwale; Arman Afrasiyabi; Justin Lague; Ichrak Shili; Ola Ahmad; Jean-Francois Lalonde

広角レンズは、広い視野を必要とする知覚タスクで一般的に使用されます。残念ながら、これらのレンズは大きな歪みを生じ、歪みの影響を無視する従来のモデルでは広角画像に適応できません。このホワイトペーパーでは、広角レンズによって生じる歪みに自動的に適応する、新しいトランスベースのモデルを紹介します。このようなレンズの物理的特性は、ラジアル歪みプロファイル (既知であると仮定) によって分析的に定義され、歪みを考慮したラジアル swin トランス (DarSwin) を開発します。従来のトランスフォーマーベースのアーキテクチャとは対照的に、DarSwin は、ラジアルパッチ分割、トークン埋め込みを作成するための歪みベースのサンプリング手法、およびラジアルパッチマージのための極位置エンコーディングで構成されます。合成的に歪んだ ImageNet データを使用して分類タスクの方法を検証し、広範な実験を通じて、DarSwin がさまざまな広角レンズの目に見えない歪みに対してゼロショット適応を実行できることを示します。他のベースラインと比較して、DarSwin は分布データでテストした場合に (トップ 1 および -5 の精度に関して) 最良の結果を達成し、中 (高) の下でトップ 1 の精度がほぼ 2% (6%) 向上します。歪みレベル、および低および非常に低い歪みレベル (遠近法のような画像) での最先端技術に匹敵します。

Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. We leverage the physical characteristics of such lenses, which are analytically defined by the radial distortion profile (assumed to be known), to develop a distortion aware radial swin transformer (DarSwin). In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and a polar position encoding for radial patch merging. We validate our method on classification tasks using synthetically distorted ImageNet data and show through extensive experiments that DarSwin can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. Compared to other baselines, DarSwin achieves the best results (in terms of Top-1 and -5 accuracy), when tested on in-distribution data, with almost 2% (6%) gain in Top-1 accuracy under medium (high) distortion levels, and comparable to the state-of-the-art under low and very low distortion levels (perspective-like images).

updated: Wed Apr 19 2023 14:32:56 GMT+0000 (UTC)

published: Wed Apr 19 2023 14:32:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト