Efficient Human Pose Estimation by Maximizing Fusion and High-Level Spatial Attention

Zhiyuan Ren; Yaohai Zhou; Yizhe Chen; Ruisong Zhou; Yayu Gao

融合と高レベルの空間的注意を最大化することによる効率的な人間の姿勢推定

本論文では、マルチレベルの特徴を融合し、軽量の注意ブロックを追加することにより、効率的な人間の姿勢推定ネットワークであるSFM（スレンダーフュージョンモデル）、HSA（高レベルの空間的注意）を提案します。効率的なネットワーク上の多くの既存の方法では、機能の融合がすでに考慮されており、パフォーマンスが大幅に向上します。ただし、ネットワーク内での融合操作が制限されているため、そのパフォーマンスはResNetやHRNetなどの大規模ネットワークよりもはるかに劣ります。具体的には、レイヤーを追加せずに2つのピラミッドフレームワーク間にブリッジを構築することで、融合操作の数を増やします。一方、長距離の依存関係をキャプチャするために、軽量の注意ブロックであるHSAを提案します。これは、2次の注意マップを計算します。要約すると、SFMは、限られた数のレイヤーで機能融合の数を最大化します。 HSAは、空間注意マップの注意を計算することにより、高精度の空間情報を学習します。 SFMとHSAの助けを借りて、私たちのネットワークは、マルチレベルの機能を生成し、少ないコンピューティングリソースで正確なグローバル空間情報を抽出することができます。したがって、私たちの方法は、より少ないパラメータと計算コストで同等またはさらに優れた精度を達成します。当社のSFMは、MPII検証セットでPCKh @ 0.5で89.0、PCKh @ 0.1で42.0、APで71.7、COCO検証でAP @ 0.5で90.7を達成し、1.7GFLOPと1.5Mパラメーターのみを使用します。ソースコードはまもなく公開されます。

In this paper, we propose an efficient human pose estimation network -- SFM (slender fusion model) by fusing multi-level features and adding lightweight attention blocks -- HSA (High-Level Spatial Attention). Many existing methods on efficient network have already taken feature fusion into consideration, which largely boosts the performance. However, its performance is far inferior to large network such as ResNet and HRNet due to its limited fusion operation in the network. Specifically, we expand the number of fusion operation by building bridges between two pyramid frameworks without adding layers. Meanwhile, to capture long-range dependency, we propose a lightweight attention block -- HSA, which computes second-order attention map. In summary, SFM maximizes the number of feature fusion in a limited number of layers. HSA learns high precise spatial information by computing the attention of spatial attention map. With the help of SFM and HSA, our network is able to generate multi-level feature and extract precise global spatial information with little computing resource. Thus, our method achieve comparable or even better accuracy with less parameters and computational cost. Our SFM achieve 89.0 in PCKh@0.5, 42.0 in PCKh@0.1 on MPII validation set and 71.7 in AP, 90.7 in AP@0.5 on COCO validation with only 1.7G FLOPs and 1.5M parameters. The source code will be public soon.

updated: Thu Jul 29 2021 00:55:17 GMT+0000 (UTC)

published: Thu Jul 29 2021 00:55:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト