CDGNet: Class Distribution Guided Network for Human Parsing

Kunliang Liu; Ouk Choi; Jianming Wang; Wonjun Hwang

CDGNet：人間の構文解析のためのクラス配布ガイド付きネットワーク

人間の構文解析の目的は、画像内の人間を構成要素に分割することです。このタスクには、クラスに従って人間の画像の各ピクセルにラベルを付けることが含まれます。人体は階層的に構造化された部分で構成されているため、画像の各体の部分は、その唯一の位置分布特性を持つことができます。おそらく、人間の頭が足の下にある可能性は低く、腕は胴体の近くにある可能性が高くなります。この観察に触発されて、監視信号として利用できる水平方向と垂直方向に元の人間の解析ラベルを蓄積することにより、インスタンスクラスの分布を作成します。これらの水平および垂直のクラス分布ラベルを使用して、ネットワークは各クラスの固有の位置分布を活用するように誘導されます。 2つのガイド付き機能を組み合わせて空間ガイダンスマップを形成し、それを乗算と連結によってベースラインネットワークに重ね合わせて、人間の部分を正確に区別します。 LIP、ATR、CIHPデータベースの3つの有名なベンチマークで、この方法の有効性と優位性を実証するために、広範な実験を実施しました。

The objective of human parsing is to partition a human in an image into constituent parts. This task involves labeling each pixel of the human image according to the classes. Since the human body comprises hierarchically structured parts, each body part of an image can have its sole position distribution characteristic. Probably, a human head is less likely to be under the feet, and arms are more likely to be near the torso. Inspired by this observation, we make instance class distributions by accumulating the original human parsing label in the horizontal and vertical directions, which can be utilized as supervision signals. Using these horizontal and vertical class distribution labels, the network is guided to exploit the intrinsic position distribution of each class. We combine two guided features to form a spatial guidance map, which is then superimposed onto the baseline network by multiplication and concatenation to distinguish the human parts precisely. We conducted extensive experiments to demonstrate the effectiveness and superiority of our method on three well-known benchmarks: LIP, ATR, and CIHP databases.

updated: Tue Mar 15 2022 11:27:39 GMT+0000 (UTC)

published: Sun Nov 28 2021 15:18:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト