Hyneter: Hybrid Network Transformer for Object Detection

Dong Chen; Duoqian Miao; Xuerong Zhao

Hyneter: 物体検出用のハイブリッドネットワークトランスフォーマー

このホワイトペーパーでは、CNN ベースの検出器と Transformer ベースの検出器の本質的な違いは、Transformer ベースの方法で小さなオブジェクトのパフォーマンスを低下させる原因であり、特徴の抽出と伝播におけるローカル情報とグローバル依存関係の間のギャップであることを指摘します。これらの違いに対処するために、ハイブリッドネットワークトランスフォーマー (Hyneter) と呼ばれる新しいビジョントランスフォーマーを提案します。これは、CNN ベースの方法とトランスフォーマーベースの方法でサイズの異なるオブジェクトが不均一に増加する原因となるギャップを示す予備実験の後です。従来の分割統治方式とは異なり、Hyneters はハイブリッドネットワークバックボーン (HNB) とデュアルスイッチングモジュール (DS) で構成され、ローカル情報とグローバル依存関係を統合して同時に転送します。バランス戦略に基づいて、HNB は畳み込み層を Transformer ブロックに埋め込むことでローカル情報の範囲を拡張し、DS はパッチ外のグローバルな依存関係への過度の依存を調整します。

In this paper, we point out that the essential differences between CNN-based and Transformer-based detectors, which cause the worse performance of small objects in Transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision Transformer, called Hybrid Network Transformer (Hyneter), after pre-experiments that indicate the gap causes CNN-based and Transformer-based methods to increase size-different objects result unevenly. Different from the divide and conquer strategy in previous methods, Hyneters consist of Hybrid Network Backbone (HNB) and Dual Switching module (DS), which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into Transformer blocks, and DS adjusts excessive reliance on global dependencies outside the patch.

updated: Sat Feb 18 2023 15:39:53 GMT+0000 (UTC)

published: Sat Feb 18 2023 15:39:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト