Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method

Qingsong Zhao; Zhipeng Zhou; Yi Wang; Yu Qiao; Cairong Zhao

Hilbert Flattening: 局所性保存行列展開法

Zigzag flattening (ZF) は、ビジョントランスフォーマー (ViTs) などのディープモデルで画像パッチの順序を取得するためのデフォルトオプションとして一般的に使用されます。特に、マルチスケール画像を分解するとき、ZF は特徴点位置の不変性を維持できませんでした。この目的のために、ビジョンタスクにおけるシーケンス順序付けの代替としてヒルベルト平坦化 (HF) を調査します。 HF は、次元空間のマルチスケール変換を実行する際に、空間的局所性を維持する上で他の平坦化アプローチよりも優れていることが証明されています。アプリケーションでは、HF に基づいた位置エンコーディング方法を設計し、Transformer アーキテクチャでの絶対位置エンコーディングを自明ではありません。また、ダウンサンプリングや機能/画像補間の機能にも使用できます。広範な実験により、いくつかの一般的なアーキテクチャとアプリケーションで一貫したパフォーマンスの向上が得られることが実証されています。コードは承認後にリリースされます。

Zigzag flattening (ZF) is commonly utilized as a default option to get the image patches ordering in deep models, e.g. vision transformers (ViTs). Notably, when decomposing multi-scale images, ZF could not maintain the invariance of feature point positions.To this end, we investigate the Hilbert flattening (HF) as an alternative for sequence ordering in vision tasks. HF has proven to be superior to other flatten approaches in maintaining spatial locality, when performing multi-scale transformations of dimensional space. In applications, we design a position encoding method based on HF, beating absolute position encoding non-trivially in Transformer architecture. It also can be used to feature down-sampling and feature/image interpolation. Extensive experiments demonstrate that it can yield consistent performance boosts for several popular architectures and applications. The code will be released upon acceptance.

updated: Thu Dec 29 2022 10:58:04 GMT+0000 (UTC)

published: Mon Feb 21 2022 13:53:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト