Local-Aware Global Attention Network for Person Re-Identification

Nathanael L. Baisa

人物再識別のためのローカル認識型グローバル・アテンション・ネットワーク

画像から代表的で堅牢で識別可能な情報を学習することは、効果的な個人の再識別 (Re-Id) に不可欠です。この論文では、体と手の両方の画像に基づいて、人のRe-Idのエンドツーエンドの判別的深層特徴学習のための複合アプローチを提案します。 Local-Aware Global Attention Network (LAGA-Net) を慎重に設計します。これは、空間アテンション用の 1 つのブランチ、チャネルアテンション用の 1 つのブランチ、グローバルフィーチャ表現用の 1 つのブランチ、およびローカルフィーチャリプレゼンテーション用の別のブランチで構成されるマルチブランチディープネットワークアーキテクチャです。 .アテンションブランチは、無関係な背景を抑制しながら、画像の関連する特徴に焦点を当てます。注意メカニズムの弱点を克服するために、ピクセルシャッフルと同等に、相対位置エンコーディングを空間注意モジュールに統合して、ピクセルの空間位置をキャプチャします。グローバルブランチは、グローバルコンテキストまたは構造情報を保持することを目的としています。細粒度の情報を取得するローカルブランチでは、conv-layer で水平方向にストライプを生成するために均一なパーティショニングを実行します。画像を明示的に分割したり、ポーズ推定などの外部キューを必要とせずに、ソフト分割を実行してパーツを取得します。アブレーション研究のセットは、各コンポーネントが LAGA-Net のパフォーマンスの向上に寄与することを示しています。 4 つの一般的な身体ベースの人物 Re-Id ベンチマークと 2 つの公開されている手のデータセットに関する広範な評価は、提案された方法が既存の最先端の方法より一貫して優れていることを示しています。

Learning representative, robust and discriminative information from images is essential for effective person re-identification (Re-Id). In this paper, we propose a compound approach for end-to-end discriminative deep feature learning for person Re-Id based on both body and hand images. We carefully design the Local-Aware Global Attention Network (LAGA-Net), a multi-branch deep network architecture consisting of one branch for spatial attention, one branch for channel attention, one branch for global feature representations and another branch for local feature representations. The attention branches focus on the relevant features of the image while suppressing the irrelevant backgrounds. In order to overcome the weakness of the attention mechanisms, equivariant to pixel shuffling, we integrate relative positional encodings into the spatial attention module to capture the spatial positions of pixels. The global branch intends to preserve the global context or structural information. For the the local branch, which intends to capture the fine-grained information, we perform uniform partitioning to generate stripes on the conv-layer horizontally. We retrieve the parts by conducting a soft partition without explicitly partitioning the images or requiring external cues such as pose estimation. A set of ablation study shows that each component contributes to the increased performance of the LAGA-Net. Extensive evaluations on four popular body-based person Re-Id benchmarks and two publicly available hand datasets demonstrate that our proposed method consistently outperforms existing state-of-the-art methods.

updated: Tue Apr 04 2023 11:26:56 GMT+0000 (UTC)

published: Sun Sep 11 2022 09:43:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト