Short Range Correlation Transformer for Occluded Person Re-Identification

Yunbin Zhao; Songhao Zhu; Dongsheng Wang; Zhiwei Liang

閉塞者の再識別のための短距離相関トランス

閉塞した人物の再識別は、非効率的な特徴表現や低い認識精度などの問題に直面する、コンピュータービジョンの困難な領域の1つです。畳み込みニューラルネットワークは局所的な特徴の抽出に注意を払うため、閉塞した歩行者の特徴を抽出することは困難であり、その効果はそれほど満足されていません。最近、ビジョントランスフォーマーが再識別の分野に導入され、パッチシーケンス間のグローバル機能の関係を構築することによって最も高度な結果を達成しています。ただし、局所的な特徴を抽出する際のビジョントランスフォーマーのパフォーマンスは、畳み込みニューラルネットワークのパフォーマンスよりも劣ります。したがって、PFTという名前の部分的な機能トランスフォーマーベースの個人再識別フレームワークを設計します。提案されたPFTは、ビジョントランスの効率を高めるために3つのモジュールを利用します。（1）全寸法拡張モジュールにパッチを適用します。パッチシーケンスと同じサイズの学習可能なテンソルを設計します。これは、トレーニングサンプルの多様性を豊かにするために、フルディメンションでパッチシーケンスに深く埋め込まれています。（2）融合および再構築モジュール。得られたパッチシーケンスの重要度の低い部分を抽出し、それらを元のパッチシーケンスと融合して、元のパッチシーケンスを再構築します。（3）空間スライスモジュール。パッチシーケンスを空間方向からスライスしてグループ化します。これにより、パッチシーケンスの短距離相関を効果的に改善できます。閉塞された全体的な再識別データセットに対する実験結果は、提案されたPFTネットワークが一貫して優れたパフォーマンスを達成し、最先端の方法を上回っていることを示しています。

Occluded person re-identification is one of the challenging areas of computer vision, which faces problems such as inefficient feature representation and low recognition accuracy. Convolutional neural network pays more attention to the extraction of local features, therefore it is difficult to extract features of occluded pedestrians and the effect is not so satisfied. Recently, vision transformer is introduced into the field of re-identification and achieves the most advanced results by constructing the relationship of global features between patch sequences. However, the performance of vision transformer in extracting local features is inferior to that of convolutional neural network. Therefore, we design a partial feature transformer-based person re-identification framework named PFT. The proposed PFT utilizes three modules to enhance the efficiency of vision transformer. (1) Patch full dimension enhancement module. We design a learnable tensor with the same size as patch sequences, which is full-dimensional and deeply embedded in patch sequences to enrich the diversity of training samples. (2) Fusion and reconstruction module. We extract the less important part of obtained patch sequences, and fuse them with original patch sequence to reconstruct the original patch sequences. (3) Spatial Slicing Module. We slice and group patch sequences from spatial direction, which can effectively improve the short-range correlation of patch sequences. Experimental results over occluded and holistic re-identification datasets demonstrate that the proposed PFT network achieves superior performance consistently and outperforms the state-of-the-art methods.

updated: Tue Jan 04 2022 11:12:39 GMT+0000 (UTC)

published: Tue Jan 04 2022 11:12:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト